linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* FMODE_EXEC or alike?
@ 2006-02-20 22:19 Oleg Drokin
  2006-02-21  5:51 ` Andrew Morton
  2006-02-21 10:39 ` Christoph Hellwig
  0 siblings, 2 replies; 22+ messages in thread
From: Oleg Drokin @ 2006-02-20 22:19 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel

Hello!

   We are working on a lustre client that would not require any patches
   to linux kernel. And there are few things that would be nice to have
   that I'd like your input on.

   One of those is FMODE_EXEC - to correctly detect cross-node situations with
   executing a file that is opened for write or the other way around, we need
   something like this extra file mode to be present (and used as a file open
   mode when opening files for exection, e.g. in fs/exec.c)
   Do you think there is a chance this can be included into vanilla kernel,
   or is there a better solution I oversee?
   I am just thinking about something as simple as this
   (with some suitable FMODE_EXEC define, of course):

--- linux/fs/exec.c.orig	2006-02-21 00:11:47.000000000 +0200
+++ linux/fs/exec.c	2006-02-21 00:12:24.000000000 +0200
@@ -127,7 +127,7 @@ asmlinkage long sys_uselib(const char __
 	struct nameidata nd;
 	int error;
 
-	error = __user_path_lookup_open(library, LOOKUP_FOLLOW, &nd, FMODE_READ);
+	error = __user_path_lookup_open(library, LOOKUP_FOLLOW, &nd, FMODE_READ|FMODE_EXEC);
 	if (error)
 		goto out;
 
@@ -477,7 +477,7 @@ struct file *open_exec(const char *name)
 	int err;
 	struct file *file;
 
-	err = path_lookup_open(name, LOOKUP_FOLLOW, &nd, FMODE_READ);
+	err = path_lookup_open(name, LOOKUP_FOLLOW, &nd, FMODE_READ|FMODE_EXEC);
 	file = ERR_PTR(err);
 
 	if (!err) {

   Thanks.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: FMODE_EXEC or alike?
  2006-02-20 22:19 FMODE_EXEC or alike? Oleg Drokin
@ 2006-02-21  5:51 ` Andrew Morton
  2006-02-21 11:30   ` Oleg Drokin
  2006-02-21 13:59   ` Trond Myklebust
  2006-02-21 10:39 ` Christoph Hellwig
  1 sibling, 2 replies; 22+ messages in thread
From: Andrew Morton @ 2006-02-21  5:51 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: linux-kernel, linux-fsdevel

Oleg Drokin <green@linuxhacker.ru> wrote:
>
> Hello!
> 
>    We are working on a lustre client that would not require any patches
>    to linux kernel. And there are few things that would be nice to have
>    that I'd like your input on.
> 
>    One of those is FMODE_EXEC - to correctly detect cross-node situations with
>    executing a file that is opened for write or the other way around, we need
>    something like this extra file mode to be present (and used as a file open
>    mode when opening files for exection, e.g. in fs/exec.c)
>    Do you think there is a chance this can be included into vanilla kernel,
>    or is there a better solution I oversee?
>    I am just thinking about something as simple as this
>    (with some suitable FMODE_EXEC define, of course):
> 
> --- linux/fs/exec.c.orig	2006-02-21 00:11:47.000000000 +0200
> +++ linux/fs/exec.c	2006-02-21 00:12:24.000000000 +0200
> @@ -127,7 +127,7 @@ asmlinkage long sys_uselib(const char __
>  	struct nameidata nd;
>  	int error;
>  
> -	error = __user_path_lookup_open(library, LOOKUP_FOLLOW, &nd, FMODE_READ);
> +	error = __user_path_lookup_open(library, LOOKUP_FOLLOW, &nd, FMODE_READ|FMODE_EXEC);
>  	if (error)
>  		goto out;
>  
> @@ -477,7 +477,7 @@ struct file *open_exec(const char *name)
>  	int err;
>  	struct file *file;
>  
> -	err = path_lookup_open(name, LOOKUP_FOLLOW, &nd, FMODE_READ);
> +	err = path_lookup_open(name, LOOKUP_FOLLOW, &nd, FMODE_READ|FMODE_EXEC);
>  	file = ERR_PTR(err);
>  
>  	if (!err) {
> 

Such a patch would have zero runtime cost.  I'd have no problem carrying
that if it makes things easier for lustre, personally.

We would need to understand whether this is needed by other distributed
filesystems and if so, whether the proposed implementation is suitable and
sufficient.

Please send a well-tested patch, include suitable comments around the
definition of FMODE_EXEC.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: FMODE_EXEC or alike?
  2006-02-20 22:19 FMODE_EXEC or alike? Oleg Drokin
  2006-02-21  5:51 ` Andrew Morton
@ 2006-02-21 10:39 ` Christoph Hellwig
  2006-02-22  1:03   ` Chris Wedgwood
  1 sibling, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2006-02-21 10:39 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: linux-kernel, linux-fsdevel

On Tue, Feb 21, 2006 at 12:19:48AM +0200, Oleg Drokin wrote:
> Hello!
> 
>    We are working on a lustre client that would not require any patches
>    to linux kernel. And there are few things that would be nice to have
>    that I'd like your input on.
> 
>    One of those is FMODE_EXEC - to correctly detect cross-node situations with
>    executing a file that is opened for write or the other way around, we need
>    something like this extra file mode to be present (and used as a file open
>    mode when opening files for exection, e.g. in fs/exec.c)
>    Do you think there is a chance this can be included into vanilla kernel,
>    or is there a better solution I oversee?
>    I am just thinking about something as simple as this
>    (with some suitable FMODE_EXEC define, of course):

The patch looks fine to me.  We can put it in once we'll put in the
full lustre client.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: FMODE_EXEC or alike?
  2006-02-21  5:51 ` Andrew Morton
@ 2006-02-21 11:30   ` Oleg Drokin
  2006-02-21 11:36     ` Andrew Morton
  2006-02-21 13:59   ` Trond Myklebust
  1 sibling, 1 reply; 22+ messages in thread
From: Oleg Drokin @ 2006-02-21 11:30 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-fsdevel

Hello!

  Introduce FMODE_EXEC file flag, to indicate that file is being opened for
  execution. This is useful for distributed filesystems to maintain consistent
  behavior for returning ETXTBUSY when opening for write and execution
  happens on different nodes.

Signed-off-by: Oleg Drokin <green@linuxhacker.ru>

--- linux-2.6.16-rc4/include/linux/fs.h.orig	2006-02-21 11:26:43.000000000 +0200
+++ linux-2.6.16-rc4/include/linux/fs.h	2006-02-21 11:30:16.000000000 +0200
@@ -65,6 +65,11 @@ extern int dir_notify_enable;
 #define FMODE_PREAD	8
 #define FMODE_PWRITE	FMODE_PREAD	/* These go hand in hand */
 
+/* File is being opened for execution. Primary users of this flag are
+   distributed filesystems that can use it to achieve correct ETXTBUSY
+   behavior for cross-node execution/opening_for_writing of files */
+#define FMODE_EXEC	16
+
 #define RW_MASK		1
 #define RWA_MASK	2
 #define READ 0
--- linux-2.6.16-rc4/fs/exec.c.orig	2006-02-19 20:34:06.000000000 +0200
+++ linux-2.6.16-rc4/fs/exec.c	2006-02-21 13:06:42.000000000 +0200
@@ -127,7 +127,7 @@ asmlinkage long sys_uselib(const char __
 	struct nameidata nd;
 	int error;
 
-	error = __user_path_lookup_open(library, LOOKUP_FOLLOW, &nd, FMODE_READ);
+	error = __user_path_lookup_open(library, LOOKUP_FOLLOW, &nd, FMODE_READ|FMODE_EXEC);
 	if (error)
 		goto out;
 
@@ -477,7 +477,7 @@ struct file *open_exec(const char *name)
 	int err;
 	struct file *file;
 
-	err = path_lookup_open(AT_FDCWD, name, LOOKUP_FOLLOW, &nd, FMODE_READ);
+	err = path_lookup_open(AT_FDCWD, name, LOOKUP_FOLLOW, &nd, FMODE_READ|FMODE_EXEC);
 	file = ERR_PTR(err);
 
 	if (!err) {

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: FMODE_EXEC or alike?
  2006-02-21 11:30   ` Oleg Drokin
@ 2006-02-21 11:36     ` Andrew Morton
  2006-02-21 11:56       ` Oleg Drokin
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2006-02-21 11:36 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: linux-kernel, linux-fsdevel

Oleg Drokin <green@linuxhacker.ru> wrote:
>
>   Introduce FMODE_EXEC file flag, to indicate that file is being opened for
>    execution. This is useful for distributed filesystems to maintain consistent
>    behavior for returning ETXTBUSY when opening for write and execution
>    happens on different nodes.

You forgot something.

Are other clustered/distributed filesystems likely to need something like
this and if so, is this implementation sufficient for their purposes?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: FMODE_EXEC or alike?
  2006-02-21 11:36     ` Andrew Morton
@ 2006-02-21 11:56       ` Oleg Drokin
  0 siblings, 0 replies; 22+ messages in thread
From: Oleg Drokin @ 2006-02-21 11:56 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-fsdevel

Hello!

On Tue, Feb 21, 2006 at 03:36:05AM -0800, Andrew Morton wrote:
> >   Introduce FMODE_EXEC file flag, to indicate that file is being opened for
> >    execution. This is useful for distributed filesystems to maintain consistent
> >    behavior for returning ETXTBUSY when opening for write and execution
> >    happens on different nodes.
> You forgot something.
> Are other clustered/distributed filesystems likely to need something like
> this and if so, is this implementation sufficient for their purposes?

No, I did not.
I CCed fsdevel originally in part because other clustered/distributed
filesystems developers are likely to read lkml or fsdevel and might share their
opinions about the patch and the approach itself.
Actually I am not sure if any other non-local filesystem tries to be as close
to local fs in semantic as Lustre does.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: FMODE_EXEC or alike?
  2006-02-21  5:51 ` Andrew Morton
  2006-02-21 11:30   ` Oleg Drokin
@ 2006-02-21 13:59   ` Trond Myklebust
  2006-02-21 14:15     ` Antonio Vargas
  2006-02-21 23:26     ` J. Bruce Fields
  1 sibling, 2 replies; 22+ messages in thread
From: Trond Myklebust @ 2006-02-21 13:59 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Oleg Drokin, linux-kernel, linux-fsdevel

On Mon, 2006-02-20 at 21:51 -0800, Andrew Morton wrote:
> Oleg Drokin <green@linuxhacker.ru> wrote:
> >
> > Hello!
> > 
> >    We are working on a lustre client that would not require any patches
> >    to linux kernel. And there are few things that would be nice to have
> >    that I'd like your input on.
> > 
> >    One of those is FMODE_EXEC - to correctly detect cross-node situations with
> >    executing a file that is opened for write or the other way around, we need
> >    something like this extra file mode to be present (and used as a file open
> >    mode when opening files for exection, e.g. in fs/exec.c)
> >    Do you think there is a chance this can be included into vanilla kernel,
> >    or is there a better solution I oversee?
> >    I am just thinking about something as simple as this
> >    (with some suitable FMODE_EXEC define, of course):
> > 
> > --- linux/fs/exec.c.orig	2006-02-21 00:11:47.000000000 +0200
> > +++ linux/fs/exec.c	2006-02-21 00:12:24.000000000 +0200
> > @@ -127,7 +127,7 @@ asmlinkage long sys_uselib(const char __
> >  	struct nameidata nd;
> >  	int error;
> >  
> > -	error = __user_path_lookup_open(library, LOOKUP_FOLLOW, &nd, FMODE_READ);
> > +	error = __user_path_lookup_open(library, LOOKUP_FOLLOW, &nd, FMODE_READ|FMODE_EXEC);
> >  	if (error)
> >  		goto out;
> >  
> > @@ -477,7 +477,7 @@ struct file *open_exec(const char *name)
> >  	int err;
> >  	struct file *file;
> >  
> > -	err = path_lookup_open(name, LOOKUP_FOLLOW, &nd, FMODE_READ);
> > +	err = path_lookup_open(name, LOOKUP_FOLLOW, &nd, FMODE_READ|FMODE_EXEC);
> >  	file = ERR_PTR(err);
> >  
> >  	if (!err) {
> > 
> 
> Such a patch would have zero runtime cost.  I'd have no problem carrying
> that if it makes things easier for lustre, personally.
> 
> We would need to understand whether this is needed by other distributed
> filesystems and if so, whether the proposed implementation is suitable and
> sufficient.

Hmm.... We might possibly want to use that for NFSv4 at some point in
order to deny write access to the file to other clients while it is in
use.

Cheers
  Trond


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: FMODE_EXEC or alike?
  2006-02-21 13:59   ` Trond Myklebust
@ 2006-02-21 14:15     ` Antonio Vargas
  2006-02-21 14:21       ` Oleg Drokin
  2006-02-21 14:42       ` Trond Myklebust
  2006-02-21 23:26     ` J. Bruce Fields
  1 sibling, 2 replies; 22+ messages in thread
From: Antonio Vargas @ 2006-02-21 14:15 UTC (permalink / raw)
  To: Trond Myklebust, Andrew Morton, Oleg Drokin, linux-kernel, linux-fsdevel

On 2/21/06, Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
> On Mon, 2006-02-20 at 21:51 -0800, Andrew Morton wrote:
> > Oleg Drokin <green@linuxhacker.ru> wrote:
> > >
> > > Hello!
> > >
> > >    We are working on a lustre client that would not require any patches
> > >    to linux kernel. And there are few things that would be nice to have
> > >    that I'd like your input on.
> > >
> > >    One of those is FMODE_EXEC - to correctly detect cross-node situations with
> > >    executing a file that is opened for write or the other way around, we need
> > >    something like this extra file mode to be present (and used as a file open
> > >    mode when opening files for exection, e.g. in fs/exec.c)
> > >    Do you think there is a chance this can be included into vanilla kernel,
> > >    or is there a better solution I oversee?
> > >    I am just thinking about something as simple as this
> > >    (with some suitable FMODE_EXEC define, of course):
> > >
> > > --- linux/fs/exec.c.orig    2006-02-21 00:11:47.000000000 +0200
> > > +++ linux/fs/exec.c 2006-02-21 00:12:24.000000000 +0200
> > > @@ -127,7 +127,7 @@ asmlinkage long sys_uselib(const char __
> > >     struct nameidata nd;
> > >     int error;
> > >
> > > -   error = __user_path_lookup_open(library, LOOKUP_FOLLOW, &nd, FMODE_READ);
> > > +   error = __user_path_lookup_open(library, LOOKUP_FOLLOW, &nd, FMODE_READ|FMODE_EXEC);
> > >     if (error)
> > >             goto out;
> > >
> > > @@ -477,7 +477,7 @@ struct file *open_exec(const char *name)
> > >     int err;
> > >     struct file *file;
> > >
> > > -   err = path_lookup_open(name, LOOKUP_FOLLOW, &nd, FMODE_READ);
> > > +   err = path_lookup_open(name, LOOKUP_FOLLOW, &nd, FMODE_READ|FMODE_EXEC);
> > >     file = ERR_PTR(err);
> > >
> > >     if (!err) {
> > >
> >
> > Such a patch would have zero runtime cost.  I'd have no problem carrying
> > that if it makes things easier for lustre, personally.
> >
> > We would need to understand whether this is needed by other distributed
> > filesystems and if so, whether the proposed implementation is suitable and
> > sufficient.
>
> Hmm.... We might possibly want to use that for NFSv4 at some point in
> order to deny write access to the file to other clients while it is in
> use.

When done with regards to failing a write if anyone has mapped the
file for executing it, or failing the execute if it's open/mmaped for
write, I can't really see the difference between local, remote and
clustered filesystems...

--
Greetz, Antonio Vargas aka winden of network

http://wind.codepixel.com/
windNOenSPAMntw@gmail.com
thesameasabove@amigascne.org

Every day, every year
you have to work
you have to study
you have to scene.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: FMODE_EXEC or alike?
  2006-02-21 14:15     ` Antonio Vargas
@ 2006-02-21 14:21       ` Oleg Drokin
  2006-02-22  9:57         ` Antonio Vargas
  2006-02-21 14:42       ` Trond Myklebust
  1 sibling, 1 reply; 22+ messages in thread
From: Oleg Drokin @ 2006-02-21 14:21 UTC (permalink / raw)
  To: Antonio Vargas; +Cc: linux-kernel, linux-fsdevel

Hello!

On Tue, Feb 21, 2006 at 03:15:53PM +0100, Antonio Vargas wrote:
> > > We would need to understand whether this is needed by other distributed
> > > filesystems and if so, whether the proposed implementation is suitable and
> > > sufficient.
> > Hmm.... We might possibly want to use that for NFSv4 at some point in
> > order to deny write access to the file to other clients while it is in
> > use.
> When done with regards to failing a write if anyone has mapped the
> file for executing it, or failing the execute if it's open/mmaped for
> write, I can't really see the difference between local, remote and
> clustered filesystems...

Currently this is only possible locally, when both execution and opening
for writing is performed on the same node. Then VFS enforces ETXTBSY.
But if you do exec on one node and open for writing on another,
VFSes on those nodes have no idea on what happens on all other nodes.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: FMODE_EXEC or alike?
  2006-02-21 14:15     ` Antonio Vargas
  2006-02-21 14:21       ` Oleg Drokin
@ 2006-02-21 14:42       ` Trond Myklebust
  1 sibling, 0 replies; 22+ messages in thread
From: Trond Myklebust @ 2006-02-21 14:42 UTC (permalink / raw)
  To: Antonio Vargas; +Cc: Andrew Morton, Oleg Drokin, linux-kernel, linux-fsdevel

On Tue, 2006-02-21 at 15:15 +0100, Antonio Vargas wrote:
> On 2/21/06, Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
> > On Mon, 2006-02-20 at 21:51 -0800, Andrew Morton wrote:
> > > Oleg Drokin <green@linuxhacker.ru> wrote:
> > > >
> > > > Hello!
> > > >
> > > >    We are working on a lustre client that would not require any patches
> > > >    to linux kernel. And there are few things that would be nice to have
> > > >    that I'd like your input on.
> > > >
> > > >    One of those is FMODE_EXEC - to correctly detect cross-node situations with
> > > >    executing a file that is opened for write or the other way around, we need
> > > >    something like this extra file mode to be present (and used as a file open
> > > >    mode when opening files for exection, e.g. in fs/exec.c)
> > > >    Do you think there is a chance this can be included into vanilla kernel,
> > > >    or is there a better solution I oversee?
> > > >    I am just thinking about something as simple as this
> > > >    (with some suitable FMODE_EXEC define, of course):
> > > >
> > > > --- linux/fs/exec.c.orig    2006-02-21 00:11:47.000000000 +0200
> > > > +++ linux/fs/exec.c 2006-02-21 00:12:24.000000000 +0200
> > > > @@ -127,7 +127,7 @@ asmlinkage long sys_uselib(const char __
> > > >     struct nameidata nd;
> > > >     int error;
> > > >
> > > > -   error = __user_path_lookup_open(library, LOOKUP_FOLLOW, &nd, FMODE_READ);
> > > > +   error = __user_path_lookup_open(library, LOOKUP_FOLLOW, &nd, FMODE_READ|FMODE_EXEC);
> > > >     if (error)
> > > >             goto out;
> > > >
> > > > @@ -477,7 +477,7 @@ struct file *open_exec(const char *name)
> > > >     int err;
> > > >     struct file *file;
> > > >
> > > > -   err = path_lookup_open(name, LOOKUP_FOLLOW, &nd, FMODE_READ);
> > > > +   err = path_lookup_open(name, LOOKUP_FOLLOW, &nd, FMODE_READ|FMODE_EXEC);
> > > >     file = ERR_PTR(err);
> > > >
> > > >     if (!err) {
> > > >
> > >
> > > Such a patch would have zero runtime cost.  I'd have no problem carrying
> > > that if it makes things easier for lustre, personally.
> > >
> > > We would need to understand whether this is needed by other distributed
> > > filesystems and if so, whether the proposed implementation is suitable and
> > > sufficient.
> >
> > Hmm.... We might possibly want to use that for NFSv4 at some point in
> > order to deny write access to the file to other clients while it is in
> > use.
> 
> When done with regards to failing a write if anyone has mapped the
> file for executing it, or failing the execute if it's open/mmaped for
> write, I can't really see the difference between local, remote and
> clustered filesystems...

There is a huge difference.

On local filesystems, get_write_access(), put_write_access() and
deny_write_access() may be sufficient, but for remote and clustered
filesystems you need for the filesystem itself to be able to enforce the
lock to remote writers.

NFSv2 and NFSv3 don't actually have any form of mandatory locking we can
use, but NFSv4 has CIFS-like deny bits that can be used in the OPEN call
to deny remote write access.

Cheers
  Trond


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: FMODE_EXEC or alike?
  2006-02-21 13:59   ` Trond Myklebust
  2006-02-21 14:15     ` Antonio Vargas
@ 2006-02-21 23:26     ` J. Bruce Fields
  2006-02-21 23:32       ` Trond Myklebust
  1 sibling, 1 reply; 22+ messages in thread
From: J. Bruce Fields @ 2006-02-21 23:26 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Andrew Morton, Oleg Drokin, linux-kernel, linux-fsdevel

On Tue, Feb 21, 2006 at 08:59:56AM -0500, Trond Myklebust wrote:
> Hmm.... We might possibly want to use that for NFSv4 at some point in
> order to deny write access to the file to other clients while it is in
> use.

So on the NFS client, an open with FMODE_EXEC could be translated into
an NFSv4 open with a deny_write bit (since NFSv4 opens also do windows
share locks).

An NFSv4 server might also be able to translate deny mode writes into
FMODE_EXEC in the case where it was exporting a cluster filesystem.  It
wouldn't completely solve the problem of implementing cluster-coherent
share locks (which also let you deny reads, who knows why), but it seems
like it would address the case most likely to matter.

--b.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: FMODE_EXEC or alike?
  2006-02-21 23:26     ` J. Bruce Fields
@ 2006-02-21 23:32       ` Trond Myklebust
  2006-02-22 19:57         ` J. Bruce Fields
  0 siblings, 1 reply; 22+ messages in thread
From: Trond Myklebust @ 2006-02-21 23:32 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Andrew Morton, Oleg Drokin, linux-kernel, linux-fsdevel

On Tue, 2006-02-21 at 18:26 -0500, J. Bruce Fields wrote:
> On Tue, Feb 21, 2006 at 08:59:56AM -0500, Trond Myklebust wrote:
> > Hmm.... We might possibly want to use that for NFSv4 at some point in
> > order to deny write access to the file to other clients while it is in
> > use.
> 
> So on the NFS client, an open with FMODE_EXEC could be translated into
> an NFSv4 open with a deny_write bit (since NFSv4 opens also do windows
> share locks).
> 
> An NFSv4 server might also be able to translate deny mode writes into
> FMODE_EXEC in the case where it was exporting a cluster filesystem.  It
> wouldn't completely solve the problem of implementing cluster-coherent
> share locks (which also let you deny reads, who knows why), but it seems
> like it would address the case most likely to matter.

Hmm... I don't think you want to overload write deny bits onto
FMODE_EXEC. FMODE_EXEC is basically, a read-only mode, so it
would mean that you could no longer do something like

  OPEN(READ|WRITE,DENY_WRITE) 

which I would assume is one of the more frequent Windoze open modes.

Cheers,
  Trond


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: FMODE_EXEC or alike?
  2006-02-21 10:39 ` Christoph Hellwig
@ 2006-02-22  1:03   ` Chris Wedgwood
  2006-02-22  8:59     ` Steven Whitehouse
  0 siblings, 1 reply; 22+ messages in thread
From: Chris Wedgwood @ 2006-02-22  1:03 UTC (permalink / raw)
  To: Christoph Hellwig, Oleg Drokin, linux-kernel, linux-fsdevel

On Tue, Feb 21, 2006 at 10:39:49AM +0000, Christoph Hellwig wrote:

> The patch looks fine to me.  We can put it in once we'll put in the
> full lustre client.

ocfs2 could probably use it sooner or later surely?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: FMODE_EXEC or alike?
  2006-02-22  1:03   ` Chris Wedgwood
@ 2006-02-22  8:59     ` Steven Whitehouse
  2006-02-22 21:42       ` J. Bruce Fields
  0 siblings, 1 reply; 22+ messages in thread
From: Steven Whitehouse @ 2006-02-22  8:59 UTC (permalink / raw)
  To: Chris Wedgwood
  Cc: linux-fsdevel, linux-kernel, Oleg Drokin, Christoph Hellwig

Hi,

Also GFS2 (which we hope to be ready to submit to the kernel shortly)
would want to make use of this feature. Its been a long standing entry
on our TODO list,

Steve.

On Tue, 2006-02-21 at 17:03 -0800, Chris Wedgwood wrote:
> On Tue, Feb 21, 2006 at 10:39:49AM +0000, Christoph Hellwig wrote:
> 
> > The patch looks fine to me.  We can put it in once we'll put in the
> > full lustre client.
> 
> ocfs2 could probably use it sooner or later surely?
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: FMODE_EXEC or alike?
  2006-02-21 14:21       ` Oleg Drokin
@ 2006-02-22  9:57         ` Antonio Vargas
  0 siblings, 0 replies; 22+ messages in thread
From: Antonio Vargas @ 2006-02-22  9:57 UTC (permalink / raw)
  To: Oleg Drokin, linux-kernel, linux-fsdevel

On 2/21/06, Oleg Drokin <green@linuxhacker.ru> wrote:
> Hello!
>
> On Tue, Feb 21, 2006 at 03:15:53PM +0100, Antonio Vargas wrote:
> > > > We would need to understand whether this is needed by other distributed
> > > > filesystems and if so, whether the proposed implementation is suitable and
> > > > sufficient.
> > > Hmm.... We might possibly want to use that for NFSv4 at some point in
> > > order to deny write access to the file to other clients while it is in
> > > use.
> > When done with regards to failing a write if anyone has mapped the
> > file for executing it, or failing the execute if it's open/mmaped for
> > write, I can't really see the difference between local, remote and
> > clustered filesystems...
>
> Currently this is only possible locally, when both execution and opening
> for writing is performed on the same node. Then VFS enforces ETXTBSY.
> But if you do exec on one node and open for writing on another,
> VFSes on those nodes have no idea on what happens on all other nodes.
>

Thanks for the enlightement Oleg, I had assumed the owner of the file
to write over the executable while it's being executed... sort of
self-modifying code, but s/self/external/ ;)

Greetz, Antonio Vargas aka winden of network

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: FMODE_EXEC or alike?
  2006-02-21 23:32       ` Trond Myklebust
@ 2006-02-22 19:57         ` J. Bruce Fields
  2006-02-22 21:36           ` Trond Myklebust
  0 siblings, 1 reply; 22+ messages in thread
From: J. Bruce Fields @ 2006-02-22 19:57 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Andrew Morton, Oleg Drokin, linux-kernel, linux-fsdevel

On Tue, Feb 21, 2006 at 06:32:31PM -0500, Trond Myklebust wrote:
> Hmm... I don't think you want to overload write deny bits onto
> FMODE_EXEC. FMODE_EXEC is basically, a read-only mode, so it
> would mean that you could no longer do something like
> 
>   OPEN(READ|WRITE,DENY_WRITE) 
> 
> which I would assume is one of the more frequent Windoze open modes.

Since exec will never use the above combination, I don't think the
current proposal mandates any particular semantics in that case.

So I'm assuming that we could choose the semantics to fit nfsd's
purposes.  Am I missing anything?

--b.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: FMODE_EXEC or alike?
  2006-02-22 19:57         ` J. Bruce Fields
@ 2006-02-22 21:36           ` Trond Myklebust
  2006-02-22 22:04             ` J. Bruce Fields
  0 siblings, 1 reply; 22+ messages in thread
From: Trond Myklebust @ 2006-02-22 21:36 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Andrew Morton, Oleg Drokin, linux-kernel, linux-fsdevel

On Wed, 2006-02-22 at 14:57 -0500, J. Bruce Fields wrote:
> On Tue, Feb 21, 2006 at 06:32:31PM -0500, Trond Myklebust wrote:
> > Hmm... I don't think you want to overload write deny bits onto
> > FMODE_EXEC. FMODE_EXEC is basically, a read-only mode, so it
> > would mean that you could no longer do something like
> > 
> >   OPEN(READ|WRITE,DENY_WRITE) 
> > 
> > which I would assume is one of the more frequent Windoze open modes.
> 
> Since exec will never use the above combination, I don't think the
> current proposal mandates any particular semantics in that case.
> 
> So I'm assuming that we could choose the semantics to fit nfsd's
> purposes.  Am I missing anything?

Yes. I'm saying that your mapping of the  NFSv4 DENY_WRITE share mode
into FMODE_EXEC will _only_ work for the specific combination
OPEN(READ,DENY_WRITE).

Basically, your proposal makes heavy assumptions on what clients will
want to use the share modes for, and will misbehave badly for any client
that breaks those assumptions.

Cheers,
  Trond


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: FMODE_EXEC or alike?
  2006-02-22  8:59     ` Steven Whitehouse
@ 2006-02-22 21:42       ` J. Bruce Fields
  2006-02-22 22:02         ` Christoph Hellwig
  0 siblings, 1 reply; 22+ messages in thread
From: J. Bruce Fields @ 2006-02-22 21:42 UTC (permalink / raw)
  To: Steven Whitehouse
  Cc: Chris Wedgwood, linux-fsdevel, linux-kernel, Oleg Drokin,
	Christoph Hellwig

On Wed, Feb 22, 2006 at 08:59:00AM +0000, Steven Whitehouse wrote:
> Hi,
> 
> Also GFS2 (which we hope to be ready to submit to the kernel shortly)
> would want to make use of this feature. Its been a long standing entry
> on our TODO list,

So there's a question here about ordering of these sorts of patches.

At CITI we've done some work on making nfsd play nicely with cluster
filesystems (e.g., to allow consistent locking across NFS clients
talking to different NFS servers exporting the same cluster filesystem).

The work is partly motivated by (and so far only tested with)
out-of-tree filesystems.  We've had some minor discussion with GFS2 and
OCFS2 hackers but assumed there was no use asking for serious review
until there was an in-tree filesystem that could take advantage of them.
(So we're happy to hear GFS2 is nearly ready--OCFS2 isn't attempting
cluster-coherent locking at all for now.)

Should we be pushing these sorts of patches earlier, at least as long as
they're fairly low-impact?  Or should we be observing an absolute rule
against introducing new interfaces without also introducing in-tree
users?

(Patch follows as an example--I'm not requesting it be applied for now.)

--b.


asynchronous locking api for lock managers on cluster filesystems

There is currently a filesystem ->lock() method, but it is defined only by
a few filesystems that are not exported via nfs.  So none of the lock
routines that are used by lockd or nfsv4 bother to call those methods.

Cluster filesystems would like to be able define their own ->lock() methods
and also would like to be exportable via NFS.

So we add vfs_lock_file, vfs_test_lock, and vfs_cancel_lock routines which
do call the underlying filesystem's lock routines.  These are intended to
be used by lockd and nfsd; lockd and nfsd changes to take advantage of them
are made by later patches.

Acquiring a lock may require communication with remote hosts, and to
avoid blocking lockd or nfsd threads during such communication, we allow
the results to be returned asynchronously.

When a ->lock() call needs to block, the file system will return
-EINPROGRESS, and then later return the results with a call to the routine
in the fl_vfs_callback of the lock_manager_operations struct.

Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
---

 fs/locks.c         |   97 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h |    6 +++
 2 files changed, 103 insertions(+), 0 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index d3d4567..6acf318 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1011,6 +1011,103 @@ int posix_lock_file_wait(struct file *fi
 EXPORT_SYMBOL(posix_lock_file_wait);
 
 /**
+ * vfs_lock_file - file byte range lock
+ * @filp: The file to apply the lock to
+ * @fl: The lock to be applied
+ *
+ * To avoid blocking kernel daemons, such as lockd, that need to acquire POSIX
+ * locks, the ->lock() interface may return asynchronously, before the lock has
+ * been granted or denied by the underlying filesystem, if (and only if)
+ * fl_vfs_callback is set. Callers expecting ->lock() to return asynchronously
+ * will only use F_SETLK, not F_SETLKW; they will set FL_SLEEP if (and only if)
+ * the request is for a blocking lock. When ->lock() does return asynchronously,
+ * it must return -EINPROGRESS, and call ->fl_vfs_callback() when the lock
+ * request completes.
+ * If the request is for non-blocking lock the file system should return
+ * -EINPROGRESS then try to get the lock and call the callback routine with
+ * the result. If the request timed out the callback routine will return a
+ * nonzero return code and the file system should release the lock. The file
+ * system is also responsible to keep a corresponding posix lock when it
+ * grants a lock so the VFS can find out which locks are locally held and do
+ * the correct lock cleanup when required.
+ * The underlying filesystem must not drop the kernel lock or call
+ * ->fl_vfs_callback() before returning to the caller with a -EINPROGRESS
+ * return code.
+ */
+int vfs_lock_file(struct file *filp, struct file_lock *fl)
+{
+	if (filp->f_op && filp->f_op->lock)
+		return filp->f_op->lock(filp, F_SETLK, fl);
+	else
+		return __posix_lock_file_conf(filp->f_dentry->d_inode, fl, NULL);
+}
+EXPORT_SYMBOL(vfs_lock_file);
+
+/**
+ * vfs_lock_file - file byte range lock
+ * @filp: The file to apply the lock to
+ * @fl: The lock to be applied
+ * @conf: Place to return a copy of the conflicting lock, if found.
+ *
+ * read comments for vfs_lock_file()
+ */
+int vfs_lock_file_conf(struct file *filp, struct file_lock *fl, struct file_lock *conf)
+{
+       if (filp->f_op && filp->f_op->lock) {
+	       locks_copy_lock(conf, fl);
+	       return filp->f_op->lock(filp, F_SETLK, fl);
+       } else
+               return __posix_lock_file_conf(filp->f_dentry->d_inode, fl, conf);
+}
+EXPORT_SYMBOL(vfs_lock_file_conf);
+
+/**
+ * vfs_test_lock - test file byte range lock
+ * @filp: The file to test lock for
+ * @fl: The lock to test
+ * @conf: Place to return a copy of the conflicting lock, if found.
+ */
+int vfs_test_lock(struct file *filp, struct file_lock *fl, struct file_lock *conf)
+{
+	int error;
+
+	conf->fl_type = F_UNLCK;
+	if (filp->f_op && filp->f_op->lock) {
+ 		locks_copy_lock(conf, fl);
+		error = filp->f_op->lock(filp, F_GETLK, conf);
+		if (!error) {
+			if (conf->fl_type != F_UNLCK)
+				error =  1;
+		}
+		return error;
+ 	} else
+		return posix_test_lock(filp, fl, conf);
+}
+EXPORT_SYMBOL(vfs_test_lock);
+
+/**
+ * vfs_cancel_lock - file byte range unblock lock
+ * @filp: The file to apply the unblock to
+ * @fl: The lock to be unblocked
+ *
+ * FL_CANCELED is used to cancel blocked requests
+ */
+void vfs_cancel_lock(struct file *filp, struct file_lock *fl)
+{
+	lock_kernel();
+	fl->fl_flags |= FL_CANCEL;
+	if (filp->f_op && filp->f_op->lock) {
+		/* XXX: check locking */
+		unlock_kernel();
+		filp->f_op->lock(filp, F_SETLK, fl);
+	} else {
+		posix_unblock_lock(filp, fl);
+		unlock_kernel();
+	}
+}
+EXPORT_SYMBOL(vfs_cancel_lock);
+
+/**
  * locks_mandatory_locked - Check for an active lock
  * @inode: the file to check
  *
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a57696c..4ac53c0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -666,6 +666,7 @@ extern spinlock_t files_lock;
 #define FL_POSIX	1
 #define FL_FLOCK	2
 #define FL_ACCESS	8	/* not trying to lock, just looking */
+#define FL_CANCEL	16	/* set to request cancelling a lock */
 #define FL_LEASE	32	/* lease held on this file */
 #define FL_LEASE_DENTRY	64	/* lease also broken by rename/unlink */
 #define FL_SLEEP	128	/* A blocking lock */
@@ -694,6 +695,7 @@ struct lock_manager_operations {
 	void (*fl_break)(struct file_lock *);
 	int (*fl_mylease)(struct file_lock *, struct file_lock *);
 	int (*fl_change)(struct file_lock **, int);
+	int (*fl_vfs_callback)(struct file_lock *, struct file_lock *, int result);
 };
 
 /* that will die - we need it for nfs_lock_info */
@@ -753,6 +755,10 @@ extern void locks_init_lock(struct file_
 extern void locks_copy_lock(struct file_lock *, struct file_lock *);
 extern void locks_remove_posix(struct file *, fl_owner_t);
 extern void locks_remove_flock(struct file *);
+extern int vfs_lock_file(struct file *, struct file_lock *);
+extern int vfs_lock_file_conf(struct file *, struct file_lock *, struct file_lock *);
+extern int vfs_test_lock(struct file *, struct file_lock *, struct file_lock *);
+extern void vfs_cancel_lock(struct file *, struct file_lock *);
 extern int posix_test_lock(struct file *, struct file_lock *, struct file_lock *);
 extern int posix_lock_file_conf(struct file *, struct file_lock *, struct file_lock *);
 extern int posix_lock_file(struct file *, struct file_lock *);

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: FMODE_EXEC or alike?
  2006-02-22 21:42       ` J. Bruce Fields
@ 2006-02-22 22:02         ` Christoph Hellwig
  0 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2006-02-22 22:02 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Steven Whitehouse, Chris Wedgwood, linux-fsdevel, linux-kernel,
	Oleg Drokin, Christoph Hellwig

On Wed, Feb 22, 2006 at 04:42:02PM -0500, J. Bruce Fields wrote:
> On Wed, Feb 22, 2006 at 08:59:00AM +0000, Steven Whitehouse wrote:
> > Hi,
> > 
> > Also GFS2 (which we hope to be ready to submit to the kernel shortly)
> > would want to make use of this feature. Its been a long standing entry
> > on our TODO list,
> 
> So there's a question here about ordering of these sorts of patches.
> 
> At CITI we've done some work on making nfsd play nicely with cluster
> filesystems (e.g., to allow consistent locking across NFS clients
> talking to different NFS servers exporting the same cluster filesystem).
> 
> The work is partly motivated by (and so far only tested with)
> out-of-tree filesystems.  We've had some minor discussion with GFS2 and
> OCFS2 hackers but assumed there was no use asking for serious review
> until there was an in-tree filesystem that could take advantage of them.
> (So we're happy to hear GFS2 is nearly ready--OCFS2 isn't attempting
> cluster-coherent locking at all for now.)
> 
> Should we be pushing these sorts of patches earlier, at least as long as
> they're fairly low-impact?  Or should we be observing an absolute rule
> against introducing new interfaces without also introducing in-tree
> users?

That patch adds lots of unused code so far now a clear NACK.  Even after
users are in mainline such additions would need a clear justification.
Can for example some existing locking code be removed?  fs/locks.c is
a huge mess right now and adding more and more entry points doesn't
exactly help there.

Oh, and given that's it's stuff deeply interwinded with the file locking
code it should become _GPL exports.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: FMODE_EXEC or alike?
  2006-02-22 21:36           ` Trond Myklebust
@ 2006-02-22 22:04             ` J. Bruce Fields
  2006-02-22 22:17               ` Trond Myklebust
  0 siblings, 1 reply; 22+ messages in thread
From: J. Bruce Fields @ 2006-02-22 22:04 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Andrew Morton, Oleg Drokin, linux-kernel, linux-fsdevel

On Wed, Feb 22, 2006 at 04:36:56PM -0500, Trond Myklebust wrote:
> On Wed, 2006-02-22 at 14:57 -0500, J. Bruce Fields wrote:
> > On Tue, Feb 21, 2006 at 06:32:31PM -0500, Trond Myklebust wrote:
> > > Hmm... I don't think you want to overload write deny bits onto
> > > FMODE_EXEC. FMODE_EXEC is basically, a read-only mode, so it
> > > would mean that you could no longer do something like
> > > 
> > >   OPEN(READ|WRITE,DENY_WRITE) 
> > > 
> > > which I would assume is one of the more frequent Windoze open modes.
> > 
> > Since exec will never use the above combination, I don't think the
> > current proposal mandates any particular semantics in that case.
> > 
> > So I'm assuming that we could choose the semantics to fit nfsd's
> > purposes.  Am I missing anything?
> 
> Yes. I'm saying that your mapping of the  NFSv4 DENY_WRITE share mode
> into FMODE_EXEC will _only_ work for the specific combination
> OPEN(READ,DENY_WRITE).

I understand that if FMODE_WRITE|FMODE_EXEC opens must fail, then
FMODE_EXEC is a poor fit for DENY_WRITE.

What I don't understand is the source of the requirement that
FMODE_WRITE|FMODE_EXEC opens be disallowed.

The only users of FMODE_EXEC introduced by Oleg's patch use a hardcoded
FMODE_READ|FMODE_EXEC, so it doesn't seem to impose any constraints on
the meaning of FMODE_WRITE|FMODE_EXEC.

--b.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: FMODE_EXEC or alike?
  2006-02-22 22:04             ` J. Bruce Fields
@ 2006-02-22 22:17               ` Trond Myklebust
  2006-02-22 23:31                 ` J. Bruce Fields
  0 siblings, 1 reply; 22+ messages in thread
From: Trond Myklebust @ 2006-02-22 22:17 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Andrew Morton, Oleg Drokin, linux-kernel, linux-fsdevel

On Wed, 2006-02-22 at 17:04 -0500, J. Bruce Fields wrote:
> What I don't understand is the source of the requirement that
> FMODE_WRITE|FMODE_EXEC opens be disallowed.
> 
> The only users of FMODE_EXEC introduced by Oleg's patch use a hardcoded
> FMODE_READ|FMODE_EXEC, so it doesn't seem to impose any constraints on
> the meaning of FMODE_WRITE|FMODE_EXEC.

I understand FMODE_EXEC to mean that we want to call
deny_write_access(). OTOH, FMODE_WRITE is supposed to trigger an
automatic call to get_write_access().

Those two calls are mutually exclusive.

Cheers,
  Trond


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: FMODE_EXEC or alike?
  2006-02-22 22:17               ` Trond Myklebust
@ 2006-02-22 23:31                 ` J. Bruce Fields
  0 siblings, 0 replies; 22+ messages in thread
From: J. Bruce Fields @ 2006-02-22 23:31 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Andrew Morton, Oleg Drokin, linux-kernel, linux-fsdevel

On Wed, Feb 22, 2006 at 05:17:33PM -0500, Trond Myklebust wrote:
> I understand FMODE_EXEC to mean that we want to call
> deny_write_access(). OTOH, FMODE_WRITE is supposed to trigger an
> automatic call to get_write_access().

Ugh.  OK, well that makes FMODE_EXEC useless for the server, then.

It'd be nice to be able to give the server some way of enforcing the
deny bits.  Otherwise, though translating FMODE_EXEC to DENY_WRITE will
at least allow the nfs server to deny writes by other clients, it'll
still do nothing to protect against writers on the exported filesystem
(even local writers--nevermind the cluster case).

And an open flag is attractive since it gives at least some hope that we
might be able to do DENY_WRITE atomically with the open.  Samba
apparently just does the open and then tries to get a mandatory lock
afterwards, but that seems racy.

--b.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2006-02-22 23:32 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-02-20 22:19 FMODE_EXEC or alike? Oleg Drokin
2006-02-21  5:51 ` Andrew Morton
2006-02-21 11:30   ` Oleg Drokin
2006-02-21 11:36     ` Andrew Morton
2006-02-21 11:56       ` Oleg Drokin
2006-02-21 13:59   ` Trond Myklebust
2006-02-21 14:15     ` Antonio Vargas
2006-02-21 14:21       ` Oleg Drokin
2006-02-22  9:57         ` Antonio Vargas
2006-02-21 14:42       ` Trond Myklebust
2006-02-21 23:26     ` J. Bruce Fields
2006-02-21 23:32       ` Trond Myklebust
2006-02-22 19:57         ` J. Bruce Fields
2006-02-22 21:36           ` Trond Myklebust
2006-02-22 22:04             ` J. Bruce Fields
2006-02-22 22:17               ` Trond Myklebust
2006-02-22 23:31                 ` J. Bruce Fields
2006-02-21 10:39 ` Christoph Hellwig
2006-02-22  1:03   ` Chris Wedgwood
2006-02-22  8:59     ` Steven Whitehouse
2006-02-22 21:42       ` J. Bruce Fields
2006-02-22 22:02         ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).