linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [RFD] readonly/read-write semantics
@ 2001-09-01  0:38 Bryan Henderson
  0 siblings, 0 replies; 23+ messages in thread
From: Bryan Henderson @ 2001-09-01  0:38 UTC (permalink / raw)
  To: Goswin Brederlow
  Cc: Goswin Brederlow, linux-fsdevel, linux-fsdevel-owner,
	linux-kernel, Jean-Marc Saffroy, Linus Torvalds, Alexander Viro


>> 2) I'd like to see a readonly mount state defined as "the filesystem
will
>> not change.  Period."  Not for system calls in progress, not for cache
>> synchronization, not to set an "unmounted" flag, not for writes that are
>> queued in the device driver or device...
>Thats what readonly does now, isn't it?

No, it's much more sloppy than that today.  The mount is slammed into the
readonly state and writes in progress continue to modify the filesystem
medium.  The remount system call does a weak check to see that there is no
writing going on before proceeding to set the readonly flag on.  A mkdir in
progress continues to complete.  A properly timed open for write could even
slip in.  The the opener can write away while the mount is in "readonly"
state.  And finally, Al points out that a filesystem driver that gets
scared by some consistency error may stamp its foot and declare "readonly"
state right under the nose of of open-for-write file descriptors, which
allow continued writing.

>But you want that also to be
>when the filesystem is writeable but not being written to at the
>moment, i.e. if its in a consistent state or "clean".

Well, no I don't.  That's another state.  And the mount goes in and out of
that state all the time without any help from us, so the only issue would
be detecting the state to make it useful.  I used to think Unix filesystems
always did that, but I don't seem to be able to get an ext2 filesystem
(mounted r/w) into no-fsck-required state without actually unmounting it.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFD] readonly/read-write semantics
@ 2001-09-05  2:34 Bryan Henderson
  0 siblings, 0 replies; 23+ messages in thread
From: Bryan Henderson @ 2001-09-05  2:34 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Jean-Marc Saffroy, Linus Torvalds


>So the most you would need
>> to wait for in going into the hard "read only" state I defined is for
any
>> page I/O to complete.  And for the "no new writes" state, you just write
>> protect all the pages (and any new ones that fault in too).
>
>It's not that simple.  At the very least you need an equivalent of msync()
>on each of these mappings before you can do anything of that kind.

I agree.  An ordinary remount shouldn't immediately go into hard readonly
state.  It should spend some time in no-new-writes state, during which it
flushes buffered writes, and I include in that dirty VM mapped pages, and
closes the filesystem.

My most basic point underlying all this, though, is that it should _not_
wait for all the files open for write to close (or fail because because
they haven't).

I thought there were also emergency cases where the filesystem driver
didn't want any more writing going on for fear of causing more damage.
That's why I mentioned the case where you might want to go straight to hard
readonly state.

>BTW, for real fun think of the situation when you have one of the swap
>components in a regular file on your filesystem.  Do you seriously want
>do_remount() to do automagical swapoff(2) on relevant swap components?

There are all kinds of ways I can shoot myself in the foot by making a
mount readonly that I really want to be writing through.  Is this one
special?

>IMO it's a userland job.

Sounds right to me.  We weren't going to talk about implementation yet,
though.  For starters, it would just be nice to agree what MS_RDONLY means
(and perhaps a few other similar flags).



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFD] readonly/read-write semantics
  2001-09-04 19:50 Bryan Henderson
@ 2001-09-05  2:15 ` Alexander Viro
  0 siblings, 0 replies; 23+ messages in thread
From: Alexander Viro @ 2001-09-05  2:15 UTC (permalink / raw)
  To: Bryan Henderson
  Cc: linux-fsdevel, linux-kernel, Jean-Marc Saffroy, Linus Torvalds



On Tue, 4 Sep 2001, Bryan Henderson wrote:

> >Uh-oh...  How about shared mappings?
> 
> It's always shared mappings, isn't it?  :-)
> 
> Virtual memory access to the file is even easier, though.  A write in
> progress is an individual store to virtual memory.  The only way you could
> even see it is if a page fault is in progress.  So the most you would need
> to wait for in going into the hard "read only" state I defined is for any
> page I/O to complete.  And for the "no new writes" state, you just write
> protect all the pages (and any new ones that fault in too).

It's not that simple.  At the very least you need an equivalent of msync()
on each of these mappings before you can do anything of that kind.  In
effect, you are describing something very similar to revoke(2).  Which might
make sense, but I really doubt that it's a work for do_remount().

BTW, for real fun think of the situation when you have one of the swap
components in a regular file on your filesystem.  Do you seriously want
do_remount() to do automagical swapoff(2) on relevant swap components?

IMO it's a userland job.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFD] readonly/read-write semantics
@ 2001-09-04 19:50 Bryan Henderson
  2001-09-05  2:15 ` Alexander Viro
  0 siblings, 1 reply; 23+ messages in thread
From: Bryan Henderson @ 2001-09-04 19:50 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Jean-Marc Saffroy, Linus Torvalds


>>
>> 1) I want to see files open for write have nothing to do with it.  Unix
>> open/close is not a transaction, it's just a connection.  Some
applications
>> manage to use open/close as a transaction, but we're seeing less and
less
>> of that as more sophisticated facilities for transactions become
available.
>>
>> How many times have we all been frustrated trying to remount read only
when
>> some log file that hasn't been written to for hours is open for write?
>>
>> A file write is in progress when a write() system call hasn't returned,
not
>> when the file is open for write.
>
>Uh-oh...  How about shared mappings?

It's always shared mappings, isn't it?  :-)

Virtual memory access to the file is even easier, though.  A write in
progress is an individual store to virtual memory.  The only way you could
even see it is if a page fault is in progress.  So the most you would need
to wait for in going into the hard "read only" state I defined is for any
page I/O to complete.  And for the "no new writes" state, you just write
protect all the pages (and any new ones that fault in too).




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFD] readonly/read-write semantics
@ 2001-09-04 18:39 Bryan Henderson
  0 siblings, 0 replies; 23+ messages in thread
From: Bryan Henderson @ 2001-09-04 18:39 UTC (permalink / raw)
  To: Pavel Machek
  Cc: linux-fsdevel, linux-kernel, Jean-Marc Saffroy, Linus Torvalds,
	Alexander Viro


>Okay, make the definition
>
>"this kernel will not attempt to change anything on that filesystem".

Not quite.  As mentioned a little earlier, you need to add "through this
mount."  It's a state of the mount, not the kernel image.

It's a good point, though, that we shouldn't kid ourselves that having a
mount in read-only state means the filesystem is read-only.  In the cases
where it does, though, it's an especially useful state.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFD] readonly/read-write semantics
  2001-09-04 10:15             ` Alexander Viro
  2001-09-04 10:20               ` Xavier Bestel
@ 2001-09-04 17:03               ` Pavel Machek
  1 sibling, 0 replies; 23+ messages in thread
From: Pavel Machek @ 2001-09-04 17:03 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Xavier Bestel, Jean-Marc Saffroy, Ingo Oeser, Bryan Henderson,
	linux-fsdevel, Linux Kernel Mailing List, Linus Torvalds

Hi!

> > > Read-only is more complex - in addition to mount side ("does anyone want
> > > it to be r/w") there is a filesystem side ("does fs agree to be r/w")...
> > 
> > How about, say, a reiserfs mounted r/o on a shared partition (loopback
> > over nfs) ? If it contains errors, maybe 2 "clients" will attempt to
> > rollback at the same time. Is the solution to never mount, even r/o,
> > remote journalling fs ?
> 
> ??? Rollback is purely local thing, so NFS client doesn't matter at all.
> And nfsd is just an application running on server, whether it's a kernel
> thread or a normal process.

Notice he said *loopback* over nfs [i.e. mount /nfs/xyzzy -o loop,ro -t 
reiserfs]. Or think nbd.

And yes, being able to mount reiserfs without replaying log is usefull.

								Pavel
-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFD] readonly/read-write semantics
  2001-09-01  4:23 ` Alexander Viro
  2001-09-01 14:42   ` Ingo Oeser
@ 2001-09-04 16:58   ` Pavel Machek
  1 sibling, 0 replies; 23+ messages in thread
From: Pavel Machek @ 2001-09-04 16:58 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Bryan Henderson, linux-fsdevel, linux-fsdevel-owner,
	linux-kernel, Jean-Marc Saffroy, Linus Torvalds

Hi!

> > 2) I'd like to see a readonly mount state defined as "the filesystem will
> > not change.  Period."  Not for system calls in progress, not for cache
> > synchronization, not to set an "unmounted" flag, not for writes that are
> > queued in the device driver or device.  (That last one may stretch
> > feasability, but it's a worthy goal anyway).
> 
> It doesn't work.  Think of r/o mounting of remote filesystem.  Do you
> suggest that it should make it impossible to change from other clients?

Okay, make the definition

"this kernel will not attempt to change anything on that filesystem".

This does not neccesarily mean -oro should have this semantics, maybe we
need something like -orealro, but we should have some mode when writing on
that disk is taboo. [I need that for suspend-to-disk support: I need to 
write suspend data do disk, while I need noone to touch those disks,
because I already took snapshot.]
								Pavel
-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFD] readonly/read-write semantics
  2001-09-04 10:59                   ` Xavier Bestel
@ 2001-09-04 11:29                     ` Alexander Viro
  0 siblings, 0 replies; 23+ messages in thread
From: Alexander Viro @ 2001-09-04 11:29 UTC (permalink / raw)
  To: Xavier Bestel
  Cc: Jean-Marc Saffroy, Ingo Oeser, Bryan Henderson, linux-fsdevel,
	Linux Kernel Mailing List, Linus Torvalds



On 4 Sep 2001, Xavier Bestel wrote:

> On mar, 2001-09-04 at 12:28, Alexander Viro wrote:
> 
> > > Sorry, I meant journal replaying ... AFAIK, this operation will
> write on
> > > the media even if mounted r/o.
> > 
> > Ditto - NFS client has no idea of that operation.
> 
> Another client mounting the same fs at the same time will have a rather
> weird idea of that operation.

"weird" == "no".

Client sees what nfsd does. And nfsd is an application, ferchrissake.
It's no different from "another process on server looks at the fs
at the same (what, BTW?) time".


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFD] readonly/read-write semantics
  2001-09-04 10:28                 ` Alexander Viro
@ 2001-09-04 10:59                   ` Xavier Bestel
  2001-09-04 11:29                     ` Alexander Viro
  0 siblings, 1 reply; 23+ messages in thread
From: Xavier Bestel @ 2001-09-04 10:59 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Jean-Marc Saffroy, Ingo Oeser, Bryan Henderson, linux-fsdevel,
	Linux Kernel Mailing List, Linus Torvalds

On mar, 2001-09-04 at 12:28, Alexander Viro wrote:

> > Sorry, I meant journal replaying ... AFAIK, this operation will
write on
> > the media even if mounted r/o.
> 
> Ditto - NFS client has no idea of that operation.

Another client mounting the same fs at the same time will have a rather
weird idea of that operation.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFD] readonly/read-write semantics
  2001-09-04 10:20               ` Xavier Bestel
@ 2001-09-04 10:28                 ` Alexander Viro
  2001-09-04 10:59                   ` Xavier Bestel
  0 siblings, 1 reply; 23+ messages in thread
From: Alexander Viro @ 2001-09-04 10:28 UTC (permalink / raw)
  To: Xavier Bestel
  Cc: Jean-Marc Saffroy, Ingo Oeser, Bryan Henderson, linux-fsdevel,
	Linux Kernel Mailing List, Linus Torvalds



On 4 Sep 2001, Xavier Bestel wrote:

> > ??? Rollback is purely local thing, so NFS client doesn't matter at all.
> > And nfsd is just an application running on server, whether it's a kernel
> > thread or a normal process.
> 
> Sorry, I meant journal replaying ... AFAIK, this operation will write on
> the media even if mounted r/o.

Ditto - NFS client has no idea of that operation.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFD] readonly/read-write semantics
  2001-09-04 10:15             ` Alexander Viro
@ 2001-09-04 10:20               ` Xavier Bestel
  2001-09-04 10:28                 ` Alexander Viro
  2001-09-04 17:03               ` Pavel Machek
  1 sibling, 1 reply; 23+ messages in thread
From: Xavier Bestel @ 2001-09-04 10:20 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Jean-Marc Saffroy, Ingo Oeser, Bryan Henderson, linux-fsdevel,
	Linux Kernel Mailing List, Linus Torvalds

On mar, 2001-09-04 at 12:15, Alexander Viro wrote:
> 
> 
> On 4 Sep 2001, Xavier Bestel wrote:
> 
> > On mar, 2001-09-04 at 06:09, Alexander Viro wrote:
> > 
> > > Read-only is more complex - in addition to mount side ("does anyone want
> > > it to be r/w") there is a filesystem side ("does fs agree to be r/w")...
> > 
> > How about, say, a reiserfs mounted r/o on a shared partition (loopback
> > over nfs) ? If it contains errors, maybe 2 "clients" will attempt to
> > rollback at the same time. Is the solution to never mount, even r/o,
> > remote journalling fs ?
> 
> ??? Rollback is purely local thing, so NFS client doesn't matter at all.
> And nfsd is just an application running on server, whether it's a kernel
> thread or a normal process.

Sorry, I meant journal replaying ... AFAIK, this operation will write on
the media even if mounted r/o.

Xav

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFD] readonly/read-write semantics
  2001-09-04  9:26           ` Xavier Bestel
@ 2001-09-04 10:15             ` Alexander Viro
  2001-09-04 10:20               ` Xavier Bestel
  2001-09-04 17:03               ` Pavel Machek
  0 siblings, 2 replies; 23+ messages in thread
From: Alexander Viro @ 2001-09-04 10:15 UTC (permalink / raw)
  To: Xavier Bestel
  Cc: Jean-Marc Saffroy, Ingo Oeser, Bryan Henderson, linux-fsdevel,
	Linux Kernel Mailing List, Linus Torvalds



On 4 Sep 2001, Xavier Bestel wrote:

> On mar, 2001-09-04 at 06:09, Alexander Viro wrote:
> 
> > Read-only is more complex - in addition to mount side ("does anyone want
> > it to be r/w") there is a filesystem side ("does fs agree to be r/w")...
> 
> How about, say, a reiserfs mounted r/o on a shared partition (loopback
> over nfs) ? If it contains errors, maybe 2 "clients" will attempt to
> rollback at the same time. Is the solution to never mount, even r/o,
> remote journalling fs ?

??? Rollback is purely local thing, so NFS client doesn't matter at all.
And nfsd is just an application running on server, whether it's a kernel
thread or a normal process.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFD] readonly/read-write semantics
  2001-09-04  4:09         ` Alexander Viro
@ 2001-09-04  9:26           ` Xavier Bestel
  2001-09-04 10:15             ` Alexander Viro
  0 siblings, 1 reply; 23+ messages in thread
From: Xavier Bestel @ 2001-09-04  9:26 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Jean-Marc Saffroy, Ingo Oeser, Bryan Henderson, linux-fsdevel,
	Linux Kernel Mailing List, Linus Torvalds

On mar, 2001-09-04 at 06:09, Alexander Viro wrote:

> Read-only is more complex - in addition to mount side ("does anyone want
> it to be r/w") there is a filesystem side ("does fs agree to be r/w")...

How about, say, a reiserfs mounted r/o on a shared partition (loopback
over nfs) ? If it contains errors, maybe 2 "clients" will attempt to
rollback at the same time. Is the solution to never mount, even r/o,
remote journalling fs ?

	Xav

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFD] readonly/read-write semantics
  2001-09-04  2:07       ` Jean-Marc Saffroy
@ 2001-09-04  4:09         ` Alexander Viro
  2001-09-04  9:26           ` Xavier Bestel
  0 siblings, 1 reply; 23+ messages in thread
From: Alexander Viro @ 2001-09-04  4:09 UTC (permalink / raw)
  To: Jean-Marc Saffroy
  Cc: Ingo Oeser, Bryan Henderson, linux-fsdevel, linux-kernel, Linus Torvalds



On Tue, 4 Sep 2001, Jean-Marc Saffroy wrote:

> > Notice that setups along the lines "mount /dev/sda5 read-only on
> > /home/jail/pub and read-write on /home/ftp/pub" are not that
> > unreasonable, so even for local filesystems it might make sense.
> >
> > IOW, I suspect that right solution would have two separate layers -
> > 	* does anyone get write access under that mountpoint? (VFS)
> > 	* is this fs asked to handle write access and had it agreed with that?
> > (filesystem)
> 
> Then a mount point could be compared to the notion of view in a database,
> right ? Sounds nice.

Well, they _are_ views.  We have two distinct objects - superblock (fs
tree itself) and vfsmount (view into it).  Some of the flags obviously
belong to the latter (e.g. nosuid, nodev and noexec - in -ac they are
per-mountpoint and in effect you can turn them on and off on arbitrary
subtrees - e.g.
mount --bind /home/k1dd13 /home/k1dd13
mount -o remount,noexec /home/k1dd13
will have expected effect).

Read-only is more complex - in addition to mount side ("does anyone want
it to be r/w") there is a filesystem side ("does fs agree to be r/w")...


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFD] readonly/read-write semantics
  2001-09-01 16:44     ` Alexander Viro
  2001-09-01 17:13       ` Nicholas Knight
@ 2001-09-04  2:07       ` Jean-Marc Saffroy
  2001-09-04  4:09         ` Alexander Viro
  1 sibling, 1 reply; 23+ messages in thread
From: Jean-Marc Saffroy @ 2001-09-04  2:07 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Ingo Oeser, Bryan Henderson, linux-fsdevel, linux-kernel, Linus Torvalds

On Sat, 1 Sep 2001, Alexander Viro wrote:

> IMO a part of the problem is that we are mixing "I'm not asking that
> to be writable" with "I won't let you write".  The former belongs
> to the mounting side, the latter - to filesystem.

In addition to this, I would like to be sure that a (local or remote)
file system that is not mounted r/w will not be affected by local activity
(eg. not even if I pull the power cord).

> Notice that setups along the lines "mount /dev/sda5 read-only on
> /home/jail/pub and read-write on /home/ftp/pub" are not that
> unreasonable, so even for local filesystems it might make sense.
>
> IOW, I suspect that right solution would have two separate layers -
> 	* does anyone get write access under that mountpoint? (VFS)
> 	* is this fs asked to handle write access and had it agreed with that?
> (filesystem)

Then a mount point could be compared to the notion of view in a database,
right ? Sounds nice.

-- 
Jean-Marc Saffroy - Research Engineer - Silicomp Research Institute
mailto:saffroy@ri.silicomp.fr


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFD] readonly/read-write semantics
  2001-08-31 20:56 ` [RFD] readonly/read-write semantics Alexander Viro
  2001-09-01 13:08   ` Juan Quintela
@ 2001-09-04  1:16   ` Jean-Marc Saffroy
  1 sibling, 0 replies; 23+ messages in thread
From: Jean-Marc Saffroy @ 2001-09-04  1:16 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-kernel, linux-fsdevel, Linus Torvalds

On Fri, 31 Aug 2001, Alexander Viro wrote:

> > In 2.4.9, I have encountered a strange condition while playing with file
> > structs chained on a superblock list (sb->s_files) : some of them can have
> > a NULL f_dentry pointer. The only case I found which can cause this is
> > when fput is called and f_count drops to zero. Is that the only case ?
>
> Yes, it is, and yes, it's legitimate - code that scans that list should
> (and in-tree one does) deal with such case.

AFAICT fput (and also dentry_open, BTW) nullifies f_dentry without any
lock held, so code that scans the list (such as fs_may_remount_ro, I
haven't looked for other instances) can never assume that a file struct
found in the list has or even will keep what looks like a valid f_dentry.

> fs_may_remount_ro() is, indeed, racy and had been since very long.

Sure, let's consider code in fs_may_remount_ro :

	file_list_lock();
	/* loop over files in sb->s_files */
		if (!file->f_dentry)
			continue;
		/* now a concurrent fput may set f_dentry to NULL */
		inode = file->f_dentry->d_inode; /* oops */

Maybe the file struct should be removed from the list /before/ f_dentry is
assigned NULL ?

> However, the main problem is not in opening something after the
> check - the check itself is not exact enough.

I agree fs_may_remount_ro can report wrong results (ie. "you may remount
ro" while you really can't) because of how it is used, but as stated
above, I think it also has a small but real potential for directly
crashing the system, and should be fixed.


-- 
Jean-Marc Saffroy - Research Engineer - Silicomp Research Institute
mailto:saffroy@ri.silicomp.fr


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFD] readonly/read-write semantics
  2001-09-01 16:44     ` Alexander Viro
@ 2001-09-01 17:13       ` Nicholas Knight
  2001-09-04  2:07       ` Jean-Marc Saffroy
  1 sibling, 0 replies; 23+ messages in thread
From: Nicholas Knight @ 2001-09-01 17:13 UTC (permalink / raw)
  To: Alexander Viro, Ingo Oeser
  Cc: Bryan Henderson, linux-fsdevel, linux-kernel, Linus Torvalds

On Saturday 01 September 2001 09:44 am, Alexander Viro wrote:
> On Sat, 1 Sep 2001, Ingo Oeser wrote:
> > On Sat, Sep 01, 2001 at 12:23:05AM -0400, Alexander Viro wrote:
> > > > 2) I'd like to see a readonly mount state defined as "the
> > > > filesystem will not change.  Period."  Not for system calls in
> > > > progress, not for cache synchronization, not to set an
> > > > "unmounted" flag, not for writes that are queued in the device
> > > > driver or device.  (That last one may stretch feasability, but
> > > > it's a worthy goal anyway).
> > >
> > > It doesn't work.  Think of r/o mounting of remote filesystem.  Do
> > > you suggest that it should make it impossible to change from other
> > > clients?
> >
> > It's sufficient for local file systems. Or see it this way: The
> > machine, that mounted it r/o will NOT write to it until it is
> > mounted r/w again.
>
> That's _also_ not true for remote filesystems.  We can mount the same
> filesystem over NFS again without unmounting the old instance.  Always
> could.
>
> IMO a part of the problem is that we are mixing "I'm not asking that
> to be writable" with "I won't let you write".  The former belongs
> to the mounting side, the latter - to filesystem.

It's really a band-aid, I seriously doubt anybody is going to claim that 
it's "perfect".
The state that he (and I for that matter) want is "This is mounted, we 
can read from it, but under *NO CIRCUMSTANCES* will we change the 
filesystem through this mount, ever."
In addition to the filesystem-stamping-its-foot situation, this could 
help if someone is testing a new, potentialy unstable driver (filesystem 
or block device) and wants to stop all writes IMMIEDIATELY so that they 
can check the data present on that filesystem/device.

Again, this isn't perfect, but I think it would have many potential uses 
(filesystem error would probably be the most useful application) and 
really should have been implimented long ago.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFD] readonly/read-write semantics
  2001-09-01 14:42   ` Ingo Oeser
@ 2001-09-01 16:44     ` Alexander Viro
  2001-09-01 17:13       ` Nicholas Knight
  2001-09-04  2:07       ` Jean-Marc Saffroy
  0 siblings, 2 replies; 23+ messages in thread
From: Alexander Viro @ 2001-09-01 16:44 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: Bryan Henderson, linux-fsdevel, linux-kernel, Linus Torvalds



On Sat, 1 Sep 2001, Ingo Oeser wrote:

> On Sat, Sep 01, 2001 at 12:23:05AM -0400, Alexander Viro wrote:
> > > 2) I'd like to see a readonly mount state defined as "the filesystem will
> > > not change.  Period."  Not for system calls in progress, not for cache
> > > synchronization, not to set an "unmounted" flag, not for writes that are
> > > queued in the device driver or device.  (That last one may stretch
> > > feasability, but it's a worthy goal anyway).
> > 
> > It doesn't work.  Think of r/o mounting of remote filesystem.  Do you
> > suggest that it should make it impossible to change from other clients?
> 
> It's sufficient for local file systems. Or see it this way: The
> machine, that mounted it r/o will NOT write to it until it is
> mounted r/w again.

That's _also_ not true for remote filesystems.  We can mount the same
filesystem over NFS again without unmounting the old instance.  Always
could.

IMO a part of the problem is that we are mixing "I'm not asking that
to be writable" with "I won't let you write".  The former belongs
to the mounting side, the latter - to filesystem.

Notice that setups along the lines "mount /dev/sda5 read-only on /home/jail/pub
and read-write on /home/ftp/pub" are not that unreasonable, so even for local
filesystems it might make sense.

IOW, I suspect that right solution would have two separate layers -
	* does anyone get write access under that mountpoint? (VFS)
	* is this fs asked to handle write access and had it agreed with that?
(filesystem)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFD] readonly/read-write semantics
  2001-09-01  4:23 ` Alexander Viro
@ 2001-09-01 14:42   ` Ingo Oeser
  2001-09-01 16:44     ` Alexander Viro
  2001-09-04 16:58   ` Pavel Machek
  1 sibling, 1 reply; 23+ messages in thread
From: Ingo Oeser @ 2001-09-01 14:42 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Bryan Henderson, linux-fsdevel, linux-kernel, Linus Torvalds

On Sat, Sep 01, 2001 at 12:23:05AM -0400, Alexander Viro wrote:
> > 2) I'd like to see a readonly mount state defined as "the filesystem will
> > not change.  Period."  Not for system calls in progress, not for cache
> > synchronization, not to set an "unmounted" flag, not for writes that are
> > queued in the device driver or device.  (That last one may stretch
> > feasability, but it's a worthy goal anyway).
> 
> It doesn't work.  Think of r/o mounting of remote filesystem.  Do you
> suggest that it should make it impossible to change from other clients?

It's sufficient for local file systems. Or see it this way: The
machine, that mounted it r/o will NOT write to it until it is
mounted r/w again.

I also like the "kill/finish all outstanding writes" idea (kill
or finish should depend on the MNT_FORCE option). Once we've
implemented it, forcible unmount becomes trivial ;-)

Forcible unmount is high on the wish list of several admins I
know.

Regards

Ingo Oeser

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFD] readonly/read-write semantics
  2001-08-31 20:56 ` [RFD] readonly/read-write semantics Alexander Viro
@ 2001-09-01 13:08   ` Juan Quintela
  2001-09-04  1:16   ` Jean-Marc Saffroy
  1 sibling, 0 replies; 23+ messages in thread
From: Juan Quintela @ 2001-09-01 13:08 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-kernel, Jean-Marc Saffroy, linux-fsdevel, Linus Torvalds

>>>>> "alexander" == Alexander Viro <viro@math.psu.edu> writes:


Hi

viro> What we need is a "I want rw access to fs"/"I give up rw access"/"make
viro> it ro" set of primitives.  Unfortunately, it's even more compilcated -
viro> e.g. fs may stomp its foot and set MS_RDONLY in ->s_flags (e.g. upon
viro> finding an error if it has such policy).  That DOESN'T look for files
viro> opened for write (reasonable) and DOESN'T revoke write access to them.

I really will like that thing for supermount, supermount tries to do
that thing by hand, and it really fails because it is difficult,
supermount tries to have the underlying fs unmounted if nobody has
open files on it, and mounted rw only when somebody has a file opened
on it and if someone has a file opened for write of there is happening
any operation that needs write access.  As we don't have an easy way
to check if we are able to write in one filesystem (we can only use
the IS_RDONLY() macro), it happens that I have to mount the filesystem
rw for being able to call permission in that filesystem.  Notice that
permission don't need write access per se, but the IS_RDONLY() macro
needs to have the filesystem mounted rw to fail.  Yes, I can hack the
macro to do the things that I need, but that means that everybody that
needs that functionality will have to also hack it :(

viro> Again, the main issue here is what do we want, not how to implement it.
viro> Flame away.

I will want a method is the inode/super_block (don't care which of
them) for:
      - is_read_only_fs()?
          Notice that this method told as if we are able to have the
          fs rw, not necessarily that the fs is rw at the moment.
      - get_write_access()
      - put_write_access()

Notice that there exist the functions get_write_access() and
put_write_access() functions in the tree, and I will be really happy if
there where a way to hook fs specific information there, as it will
make a lot of the code in supermount really easy, and the same for
other fs that need similar semantics.

Later, Juan.

-- 
In theory, practice and theory are the same, but in practice they 
are different -- Larry McVoy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFD] readonly/read-write semantics
  2001-08-31 23:35 Bryan Henderson
@ 2001-09-01  4:23 ` Alexander Viro
  2001-09-01 14:42   ` Ingo Oeser
  2001-09-04 16:58   ` Pavel Machek
  0 siblings, 2 replies; 23+ messages in thread
From: Alexander Viro @ 2001-09-01  4:23 UTC (permalink / raw)
  To: Bryan Henderson
  Cc: linux-fsdevel, linux-fsdevel-owner, linux-kernel,
	Jean-Marc Saffroy, Linus Torvalds



On Fri, 31 Aug 2001, Bryan Henderson wrote:

> 
> 1) I want to see files open for write have nothing to do with it.  Unix
> open/close is not a transaction, it's just a connection.  Some applications
> manage to use open/close as a transaction, but we're seeing less and less
> of that as more sophisticated facilities for transactions become available.
>
> How many times have we all been frustrated trying to remount read only when
> some log file that hasn't been written to for hours is open for write?
> 
> A file write is in progress when a write() system call hasn't returned, not
> when the file is open for write.

Uh-oh...  How about shared mappings?

> 2) I'd like to see a readonly mount state defined as "the filesystem will
> not change.  Period."  Not for system calls in progress, not for cache
> synchronization, not to set an "unmounted" flag, not for writes that are
> queued in the device driver or device.  (That last one may stretch
> feasability, but it's a worthy goal anyway).

It doesn't work.  Think of r/o mounting of remote filesystem.  Do you
suggest that it should make it impossible to change from other clients?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFD] readonly/read-write semantics
@ 2001-08-31 23:35 Bryan Henderson
  2001-09-01  4:23 ` Alexander Viro
  0 siblings, 1 reply; 23+ messages in thread
From: Bryan Henderson @ 2001-08-31 23:35 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-fsdevel-owner, linux-kernel,
	Jean-Marc Saffroy, Linus Torvalds


1) I want to see files open for write have nothing to do with it.  Unix
open/close is not a transaction, it's just a connection.  Some applications
manage to use open/close as a transaction, but we're seeing less and less
of that as more sophisticated facilities for transactions become available.

How many times have we all been frustrated trying to remount read only when
some log file that hasn't been written to for hours is open for write?

A file write is in progress when a write() system call hasn't returned, not
when the file is open for write.

Someone who wants to coordinate his mounts with the applications that use
them should use an external locking scheme.

2) I'd like to see a readonly mount state defined as "the filesystem will
not change.  Period."  Not for system calls in progress, not for cache
synchronization, not to set an "unmounted" flag, not for writes that are
queued in the device driver or device.  (That last one may stretch
feasability, but it's a worthy goal anyway).

3) A system call to put a mount into readonly state should not return until
all writes in progress have completed out to the medium, and the cache is
clean.  It should sync the cache, of course, and do whatever closing of the
filesystem an unmount would do.  Any attempt to start a new write during
this wait (which constitutes another mount state) should fail.

I was thinking an option to fail immediately instead of waiting for writes
to complete might be useful, but then I couldn't think of any write in
progress that would take enough time to make it worthwhile.  As long as any
new system call counts as a new write.

The same thinking applies to an option to kill writes in progress without
waiting.  Unless maybe it means to skip the cache synchronization.

4) I don't think it has any semantic relevance, but as part of this, I'd
also like to see the FS implementation stop considering read only mount
status to be a file permission issue.  (Today, it does in some places, but
doesn't in others).

I don't know enough about how filesystem drivers use the "readonly" state
today for damage control when errors happen, so I won't give an opinion on
that.  But it sounds like it would probably be that quiescing state I
mentioned in (3), not the readonly state I mentioned in (2).



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFD] readonly/read-write semantics
  2001-08-31 17:18 [Q] [VFS] NULL f_dentry for opened files ; possible race condition Jean-Marc Saffroy
@ 2001-08-31 20:56 ` Alexander Viro
  2001-09-01 13:08   ` Juan Quintela
  2001-09-04  1:16   ` Jean-Marc Saffroy
  0 siblings, 2 replies; 23+ messages in thread
From: Alexander Viro @ 2001-08-31 20:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: Jean-Marc Saffroy, linux-fsdevel, Linus Torvalds


Folks, stuff below is, IMNSHO, worth a discussion.  Please, give it
a thought.  Short summary: we need well-defined semantics for
read-only/read-write. Currently we don't have any.

On Fri, 31 Aug 2001, Jean-Marc Saffroy wrote:

(first the trivial part)

> Hello,
> 
> 
> In 2.4.9, I have encountered a strange condition while playing with file
> structs chained on a superblock list (sb->s_files) : some of them can have
> a NULL f_dentry pointer. The only case I found which can cause this is
> when fput is called and f_count drops to zero. Is that the only case ?

Yes, it is, and yes, it's legitimate - code that scans that list should
(and in-tree one does) deal with such case.
 
> While exploring the corresponding code for an explanation, I found what
> looks like a possible race condition : do_remount_sb calls
> fs_may_remount_ro, and only then uses lock_super to do the actual remount.
> 
> Isn't it possible for a program to open a file for writing just after
> fs_may_remount_ro ? The whole thing seems to be protected by the BKL and
> mount_sem, but I guess it won't stop an open.

... and here comes the serious stuff.

mount_sem, BKL and lock_super() have nothing to checks done in the open().

fs_may_remount_ro() is, indeed, racy and had been since very long.
However, the main problem is not in opening something after the
check - the check itself is not exact enough.

Think what happens if the object we hold for writing doesn't currently
have struct file.  At all.  E.g. it is a directory in the middle of
subdirectory creation.

Or, for that matter, combine that with "what happens if we do ...
after we've done the checks" - e.g. consider mkdir() called after
the check.  Or unlink() on opened file driving ->i_nlink to 0.

What we need is a "I want rw access to fs"/"I give up rw access"/"make
it ro" set of primitives.  Unfortunately, it's even more compilcated -
e.g. fs may stomp its foot and set MS_RDONLY in ->s_flags (e.g. upon
finding an error if it has such policy).  That DOESN'T look for files
opened for write (reasonable) and DOESN'T revoke write access to them.

So you end up with fs that is claimed to be r/o, but people still have
files opened for write.

We need clear semantics for readonly/read-write state of filesystems.
Until then all we have is "well, if you go single-user before remount
or otherwise prevent users from access to mountpoit - you should be OK".

Which, BTW, is not _too_ unreasonable, since otherwise you are gambling
on the fact that users won't make the sucker busy in the wrong moment.
More clean solution would be "revoke everyone's write access and remount
r/o", but that will take quite an effort.  Which might be worth doing.

Again, the main issue here is what do we want, not how to implement it.
Flame away.


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2001-09-05  2:34 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-09-01  0:38 [RFD] readonly/read-write semantics Bryan Henderson
  -- strict thread matches above, loose matches on Subject: below --
2001-09-05  2:34 Bryan Henderson
2001-09-04 19:50 Bryan Henderson
2001-09-05  2:15 ` Alexander Viro
2001-09-04 18:39 Bryan Henderson
2001-08-31 23:35 Bryan Henderson
2001-09-01  4:23 ` Alexander Viro
2001-09-01 14:42   ` Ingo Oeser
2001-09-01 16:44     ` Alexander Viro
2001-09-01 17:13       ` Nicholas Knight
2001-09-04  2:07       ` Jean-Marc Saffroy
2001-09-04  4:09         ` Alexander Viro
2001-09-04  9:26           ` Xavier Bestel
2001-09-04 10:15             ` Alexander Viro
2001-09-04 10:20               ` Xavier Bestel
2001-09-04 10:28                 ` Alexander Viro
2001-09-04 10:59                   ` Xavier Bestel
2001-09-04 11:29                     ` Alexander Viro
2001-09-04 17:03               ` Pavel Machek
2001-09-04 16:58   ` Pavel Machek
2001-08-31 17:18 [Q] [VFS] NULL f_dentry for opened files ; possible race condition Jean-Marc Saffroy
2001-08-31 20:56 ` [RFD] readonly/read-write semantics Alexander Viro
2001-09-01 13:08   ` Juan Quintela
2001-09-04  1:16   ` Jean-Marc Saffroy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).