Linux-Fsdevel Archive on lore.kernel.org
 help / Atom feed
* Better interop for NFS/SMB file share mode/reservation
@ 2019-02-08 11:20 Amir Goldstein
  2019-02-08 13:10 ` Jeff Layton
  0 siblings, 1 reply; 16+ messages in thread
From: Amir Goldstein @ 2019-02-08 11:20 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, Volker.Lendecke, samba-technical,
	Linux NFS Mailing List, linux-fsdevel, Pavel Shilovsky

Hi Bruce,

I have been following you discussion with Volker Lendecke
on the samba technical mailing list [1] and have had discussed
this issue with Volker myself as well.

I decided to start this new thread to bring some kernel developers
in the loop and to propose an idea that takes a somewhat
different approach to the "interop" approaches I have seen
so far. "interop" in this context often means consistency of file
lock states between samba and nfs server, but I am referring
to the stronger sense of interop with local filesystem on the server.

You pointed to Pavel Shilovsky's O_DENY* patches [2] as a possible
solution to interop of NFS Share Reservation and SMB Share Mode
with local filesystems.
Some of the complaints on this approach were (rightfully) concerned
about DoS and the prospect of plaguing Linux with Windows server
"files left open" issues.

My idea comes from the observation that Windows server
administrators can release locked files that were left open by clients.
I suppose that an NFS server admin can do the same?
That realization makes "share access" locks (a.k.a. MAND_LOCK)
not so very different from oplocks (leases/delegations).
As long as samba and nfsd cooperate nicely with MAND_LOCK
semantics, we don't really have to force local filesystems
to obay MAND_LOCK semantics. If the file servers take leases
on local filesystems, they will not get exclusive write access for
files already open for write on local filesytem and same for read.

On local file access on the server that violates the share mode,
the file server acts as a grumpy washed out administrator that
automatically grants any lock revoke ticket after timeout.

This model may not fit use cases where "real" interop with
local filesystem is needed, but compared to the existing
solution (no interop at all) it is quite an improvement.

Furthermore, short of SMB DENY_DELETE, we may not even
need to change any kernel APIs.
The addition of O_DENY* open flags can make programming
easier, but taking a lease on an open file is still safe enough
to implement share reservation (no?).

Satisfying DENY_DELETE could be more tricky, but perhaps
the existing SILLYRENAME interface of==between knfsd and vfs
could be somehow utilized for this purpose?

I though of bringing this up as a TOPIC for LSF/MM, but wanted
to consult with you first. I am sure that you or Jeff can do a better
job than me in enumerating the "interop" file lock issues that
could be discussed in filesystems track forum.

Thoughts? Explanation why this idea is idiotic?

Thanks,
Amir.

[1] https://lists.samba.org/archive/samba-technical/2019-February/132366.html
[2] https://lore.kernel.org/lkml/1389953232-9428-1-git-send-email-piastry@etersoft.ru/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Better interop for NFS/SMB file share mode/reservation
  2019-02-08 11:20 Better interop for NFS/SMB file share mode/reservation Amir Goldstein
@ 2019-02-08 13:10 ` Jeff Layton
  2019-02-08 14:45   ` Amir Goldstein
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Layton @ 2019-02-08 13:10 UTC (permalink / raw)
  To: Amir Goldstein, J. Bruce Fields
  Cc: Volker.Lendecke, samba-technical, Linux NFS Mailing List,
	linux-fsdevel, Pavel Shilovsky

On Fri, 2019-02-08 at 13:20 +0200, Amir Goldstein wrote:
> Hi Bruce,
> 
> I have been following you discussion with Volker Lendecke
> on the samba technical mailing list [1] and have had discussed
> this issue with Volker myself as well.
> 
> I decided to start this new thread to bring some kernel developers
> in the loop and to propose an idea that takes a somewhat
> different approach to the "interop" approaches I have seen
> so far. "interop" in this context often means consistency of file
> lock states between samba and nfs server, but I am referring
> to the stronger sense of interop with local filesystem on the server.
> 
> You pointed to Pavel Shilovsky's O_DENY* patches [2] as a possible
> solution to interop of NFS Share Reservation and SMB Share Mode
> with local filesystems.
> Some of the complaints on this approach were (rightfully) concerned
> about DoS and the prospect of plaguing Linux with Windows server
> "files left open" issues.
> 
> My idea comes from the observation that Windows server
> administrators can release locked files that were left open by clients.
> I suppose that an NFS server admin can do the same?

The Linux kernel has no mechanism for this (aside from sending a SIGKILL
to lockd, which makes it drop all locks). Solaris did have a tool for
this at one point (and probably still does).

It's a little less of a problem now than it used to be with NFS, given
the move to NFSv4 (which has lease-based locking). If you have
misbehaving clients, you just kick them out and their locks eventually
go away. v3 locks can stick around in perpetuity however, so people have
long wanted such a tool on Linux as well.

> That realization makes "share access" locks (a.k.a. MAND_LOCK)
> not so very different from oplocks (leases/delegations).
> As long as samba and nfsd cooperate nicely with MAND_LOCK
> semantics, we don't really have to force local filesystems
> to obay MAND_LOCK semantics. If the file servers take leases
> on local filesystems, they will not get exclusive write access for
> files already open for write on local filesytem and same for read.
> 

I think this last statement isn't correct (if I'm parsing it correctly).
If a file is already open for write, then you just don't get a lease
when you try to request one. Ditto for write leases if it's already open
for read.

> On local file access on the server that violates the share mode,
> the file server acts as a grumpy washed out administrator that
> automatically grants any lock revoke ticket after timeout.
> 

Devil's advocate:

Is this situation any better than just teaching the NFS/SMB servers to
track these locks out of band? Both samba and most NFS servers respect
share/deny mode locks, but only internally -- they aren't aware of the
others'. We could (in principle) come up with a mechanism to track these
that doesn't involve plumbing them into the kernel.

That said, coherent locking is best done in the kernel, IMO...

> This model may not fit use cases where "real" interop with
> local filesystem is needed, but compared to the existing
> solution (no interop at all) it is quite an improvement.
> 
> Furthermore, short of SMB DENY_DELETE, we may not even
> need to change any kernel APIs.
> The addition of O_DENY* open flags can make programming
> easier, but taking a lease on an open file is still safe enough
> to implement share reservation (no?).
> 
> Satisfying DENY_DELETE could be more tricky, but perhaps
> the existing SILLYRENAME interface of==between knfsd and vfs
> could be somehow utilized for this purpose?
> 
> I though of bringing this up as a TOPIC for LSF/MM, but wanted
> to consult with you first. I am sure that you or Jeff can do a better
> job than me in enumerating the "interop" file lock issues that
> could be discussed in filesystems track forum.
> 
> Thoughts? Explanation why this idea is idiotic?

I think it's not a single idea. There are really two different aspects
to this given that we're really talking about two different types of
locks in SMB. I think you have to consider solving these problems
separately:

1) the ability to set a (typically whole-file) share/deny lock
atomically when you open a file. This is necessary for coherent
share/deny lock semantics. Note that these are only enforced open()
time.

2) mandatory locking (forbidding reads and writes on a byte range when
there is a conflicting lock set).

The first could (probably) be solved with something like what Pavel
proposed a few years ago...or maybe we just wire up O_EXLOCK and
O_SHLOCK:

    https://www.gnu.org/software/libc/manual/html_node/Open_002dtime-Flags.html

This seems like a fine idea (in principle) but it needs someone to drive
the work forward. You'll also likely be consuming a couple of O_* flags,
which could be tough sell (unless you come up with another way to do
it).

The second problem is much more difficult to fix correctly, and involves
interjecting locking checks into (hot) file read/write codepaths. This
is non-trivial and could have performance impacts even when no lock is
set on a file.
-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Better interop for NFS/SMB file share mode/reservation
  2019-02-08 13:10 ` Jeff Layton
@ 2019-02-08 14:45   ` Amir Goldstein
  2019-02-08 15:50     ` J. Bruce Fields
                       ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Amir Goldstein @ 2019-02-08 14:45 UTC (permalink / raw)
  To: Jeff Layton
  Cc: J. Bruce Fields, Volker.Lendecke, samba-technical,
	Linux NFS Mailing List, linux-fsdevel, Pavel Shilovsky

On Fri, Feb 8, 2019 at 3:10 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> On Fri, 2019-02-08 at 13:20 +0200, Amir Goldstein wrote:
> > Hi Bruce,
> >
> > I have been following you discussion with Volker Lendecke
> > on the samba technical mailing list [1] and have had discussed
> > this issue with Volker myself as well.
> >
> > I decided to start this new thread to bring some kernel developers
> > in the loop and to propose an idea that takes a somewhat
> > different approach to the "interop" approaches I have seen
> > so far. "interop" in this context often means consistency of file
> > lock states between samba and nfs server, but I am referring
> > to the stronger sense of interop with local filesystem on the server.
> >
> > You pointed to Pavel Shilovsky's O_DENY* patches [2] as a possible
> > solution to interop of NFS Share Reservation and SMB Share Mode
> > with local filesystems.
> > Some of the complaints on this approach were (rightfully) concerned
> > about DoS and the prospect of plaguing Linux with Windows server
> > "files left open" issues.
> >
> > My idea comes from the observation that Windows server
> > administrators can release locked files that were left open by clients.
> > I suppose that an NFS server admin can do the same?
>
> The Linux kernel has no mechanism for this (aside from sending a SIGKILL
> to lockd, which makes it drop all locks). Solaris did have a tool for
> this at one point (and probably still does).
>
> It's a little less of a problem now than it used to be with NFS, given
> the move to NFSv4 (which has lease-based locking). If you have
> misbehaving clients, you just kick them out and their locks eventually
> go away. v3 locks can stick around in perpetuity however, so people have
> long wanted such a tool on Linux as well.
>

In a nut shell, I think my proposal is that samba will do something
similar and request leases from the kernel instead of trying to
enforce real mandatory locks.

> > That realization makes "share access" locks (a.k.a. MAND_LOCK)
> > not so very different from oplocks (leases/delegations).
> > As long as samba and nfsd cooperate nicely with MAND_LOCK
> > semantics, we don't really have to force local filesystems
> > to obay MAND_LOCK semantics. If the file servers take leases
> > on local filesystems, they will not get exclusive write access for
> > files already open for write on local filesytem and same for read.
> >
>
> I think this last statement isn't correct (if I'm parsing it correctly).
> If a file is already open for write, then you just don't get a lease
> when you try to request one. Ditto for write leases if it's already open
> for read.
>

I think you miss read what I miss wrote ;-)
As the title of this thread states, I am talking about the first case
of acquiring an exclusive or read shared access to file at open time.
It may be the fact that samba currently calls flock(LOCK_MAND)
that is the source for confusion.

Open failure is the expected behavior if file is already open for
write (or read) on local filesystem, so my suggestion is:
- Server opens the file and request a lease based of desired share mode
- If file server got the lease, client gets the file handle
- Otherwise, client gets an open failure

> > On local file access on the server that violates the share mode,
> > the file server acts as a grumpy washed out administrator that
> > automatically grants any lock revoke ticket after timeout.
> >
>
> Devil's advocate:
>
> Is this situation any better than just teaching the NFS/SMB servers to
> track these locks out of band? Both samba and most NFS servers respect
> share/deny mode locks, but only internally -- they aren't aware of the
> others'. We could (in principle) come up with a mechanism to track these
> that doesn't involve plumbing them into the kernel.
>

That would be a prerequisite to my suggested solution, as I wrote:
"As long as samba and nfsd cooperate nicely with LOCK_MAND..."
That means the two file servers cooperate on the share mode locks
and try to figure out if there are outstanding leases before opening
a file that will break those leases.

> That said, coherent locking is best done in the kernel, IMO...
>

Indeed...

> > This model may not fit use cases where "real" interop with
> > local filesystem is needed, but compared to the existing
> > solution (no interop at all) it is quite an improvement.
> >
> > Furthermore, short of SMB DENY_DELETE, we may not even
> > need to change any kernel APIs.
> > The addition of O_DENY* open flags can make programming
> > easier, but taking a lease on an open file is still safe enough
> > to implement share reservation (no?).
> >
> > Satisfying DENY_DELETE could be more tricky, but perhaps
> > the existing SILLYRENAME interface of==between knfsd and vfs
> > could be somehow utilized for this purpose?
> >
> > I though of bringing this up as a TOPIC for LSF/MM, but wanted
> > to consult with you first. I am sure that you or Jeff can do a better
> > job than me in enumerating the "interop" file lock issues that
> > could be discussed in filesystems track forum.
> >
> > Thoughts? Explanation why this idea is idiotic?
>
> I think it's not a single idea. There are really two different aspects
> to this given that we're really talking about two different types of
> locks in SMB. I think you have to consider solving these problems
> separately:
>
> 1) the ability to set a (typically whole-file) share/deny lock
> atomically when you open a file. This is necessary for coherent
> share/deny lock semantics. Note that these are only enforced open()
> time.
>
> 2) mandatory locking (forbidding reads and writes on a byte range when
> there is a conflicting lock set).
>

I was only trying to address the first problem (small steps...).

> The first could (probably) be solved with something like what Pavel
> proposed a few years ago...or maybe we just wire up O_EXLOCK and
> O_SHLOCK:
>
>     https://www.gnu.org/software/libc/manual/html_node/Open_002dtime-Flags.html
>

Nice. I wasn't aware of those BSD flags.

> This seems like a fine idea (in principle) but it needs someone to drive
> the work forward. You'll also likely be consuming a couple of O_* flags,
> which could be tough sell (unless you come up with another way to do
> it).
>

Once I know the obstacles to watch out from, I can drive this work.
Thing is, I am not convinced myself that any new O_ flags are needed.

How about this (for samba, knfsd is simpler):
- pfd = open(filename, O_PATH)
- flock(pfd, LOCK_MAND) (for file servers interop)
- vfs checks no conflicting LOCK_MAND locks (like patch you once posted)
- open(filename, O_RDWR) (and verify st_ino like samba does)
- Request lease (for local fs interop)
- check_conflicting_open() is changed to use inode_is_open_for_read()
  instead of checking d_count and i_count.
- we already have i_readcount, just need to remove ifdef CONFIG_IMA
- On lease break (from local fs), break client oplocks and invalidate
file handle on server

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Better interop for NFS/SMB file share mode/reservation
  2019-02-08 14:45   ` Amir Goldstein
@ 2019-02-08 15:50     ` J. Bruce Fields
  2019-02-08 20:02       ` Amir Goldstein
  2019-02-08 16:03     ` Jeff Layton
  2019-02-11  5:31     ` ronnie sahlberg
  2 siblings, 1 reply; 16+ messages in thread
From: J. Bruce Fields @ 2019-02-08 15:50 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jeff Layton, Volker.Lendecke, samba-technical,
	Linux NFS Mailing List, linux-fsdevel, Pavel Shilovsky

On Fri, Feb 08, 2019 at 04:45:46PM +0200, Amir Goldstein wrote:
> - check_conflicting_open() is changed to use inode_is_open_for_read()
>   instead of checking d_count and i_count.

Independently of the rest, I'd love to do away with those
d_count/i_count checks.  What's inode_is_open_for_read()?

--b.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Better interop for NFS/SMB file share mode/reservation
  2019-02-08 14:45   ` Amir Goldstein
  2019-02-08 15:50     ` J. Bruce Fields
@ 2019-02-08 16:03     ` Jeff Layton
  2019-02-08 16:28       ` Jeffrey Layton
  2019-02-11  5:31     ` ronnie sahlberg
  2 siblings, 1 reply; 16+ messages in thread
From: Jeff Layton @ 2019-02-08 16:03 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: J. Bruce Fields, Volker.Lendecke, samba-technical,
	Linux NFS Mailing List, linux-fsdevel, Pavel Shilovsky

On Fri, 2019-02-08 at 16:45 +0200, Amir Goldstein wrote:
> On Fri, Feb 8, 2019 at 3:10 PM Jeff Layton <jlayton@kernel.org> wrote:
> > On Fri, 2019-02-08 at 13:20 +0200, Amir Goldstein wrote:
> > > Hi Bruce,
> > > 
> > > I have been following you discussion with Volker Lendecke
> > > on the samba technical mailing list [1] and have had discussed
> > > this issue with Volker myself as well.
> > > 
> > > I decided to start this new thread to bring some kernel developers
> > > in the loop and to propose an idea that takes a somewhat
> > > different approach to the "interop" approaches I have seen
> > > so far. "interop" in this context often means consistency of file
> > > lock states between samba and nfs server, but I am referring
> > > to the stronger sense of interop with local filesystem on the server.
> > > 
> > > You pointed to Pavel Shilovsky's O_DENY* patches [2] as a possible
> > > solution to interop of NFS Share Reservation and SMB Share Mode
> > > with local filesystems.
> > > Some of the complaints on this approach were (rightfully) concerned
> > > about DoS and the prospect of plaguing Linux with Windows server
> > > "files left open" issues.
> > > 
> > > My idea comes from the observation that Windows server
> > > administrators can release locked files that were left open by clients.
> > > I suppose that an NFS server admin can do the same?
> > 
> > The Linux kernel has no mechanism for this (aside from sending a SIGKILL
> > to lockd, which makes it drop all locks). Solaris did have a tool for
> > this at one point (and probably still does).
> > 
> > It's a little less of a problem now than it used to be with NFS, given
> > the move to NFSv4 (which has lease-based locking). If you have
> > misbehaving clients, you just kick them out and their locks eventually
> > go away. v3 locks can stick around in perpetuity however, so people have
> > long wanted such a tool on Linux as well.
> > 
> 
> In a nut shell, I think my proposal is that samba will do something
> similar and request leases from the kernel instead of trying to
> enforce real mandatory locks.
> 
> > > That realization makes "share access" locks (a.k.a. MAND_LOCK)
> > > not so very different from oplocks (leases/delegations).
> > > As long as samba and nfsd cooperate nicely with MAND_LOCK
> > > semantics, we don't really have to force local filesystems
> > > to obay MAND_LOCK semantics. If the file servers take leases
> > > on local filesystems, they will not get exclusive write access for
> > > files already open for write on local filesytem and same for read.
> > > 
> > 
> > I think this last statement isn't correct (if I'm parsing it correctly).
> > If a file is already open for write, then you just don't get a lease
> > when you try to request one. Ditto for write leases if it's already open
> > for read.
> > 
> 
> I think you miss read what I miss wrote ;-)
> As the title of this thread states, I am talking about the first case
> of acquiring an exclusive or read shared access to file at open time.
> It may be the fact that samba currently calls flock(LOCK_MAND)
> that is the source for confusion.
> 
> Open failure is the expected behavior if file is already open for
> write (or read) on local filesystem, so my suggestion is:
> - Server opens the file and request a lease based of desired share mode
> - If file server got the lease, client gets the file handle
> - Otherwise, client gets an open failure

> > > On local file access on the server that violates the share mode,
> > > the file server acts as a grumpy washed out administrator that
> > > automatically grants any lock revoke ticket after timeout.
> > > 
> > 
> > Devil's advocate:
> > 
> > Is this situation any better than just teaching the NFS/SMB servers to
> > track these locks out of band? Both samba and most NFS servers respect
> > share/deny mode locks, but only internally -- they aren't aware of the
> > others'. We could (in principle) come up with a mechanism to track these
> > that doesn't involve plumbing them into the kernel.
> > 
> 
> That would be a prerequisite to my suggested solution, as I wrote:
> "As long as samba and nfsd cooperate nicely with LOCK_MAND..."
> That means the two file servers cooperate on the share mode locks
> and try to figure out if there are outstanding leases before opening
> a file that will break those leases.
> 
> > That said, coherent locking is best done in the kernel, IMO...
> > 
> 
> Indeed...
> 
> > > This model may not fit use cases where "real" interop with
> > > local filesystem is needed, but compared to the existing
> > > solution (no interop at all) it is quite an improvement.
> > > 
> > > Furthermore, short of SMB DENY_DELETE, we may not even
> > > need to change any kernel APIs.
> > > The addition of O_DENY* open flags can make programming
> > > easier, but taking a lease on an open file is still safe enough
> > > to implement share reservation (no?).
> > > 
> > > Satisfying DENY_DELETE could be more tricky, but perhaps
> > > the existing SILLYRENAME interface of==between knfsd and vfs
> > > could be somehow utilized for this purpose?
> > > 
> > > I though of bringing this up as a TOPIC for LSF/MM, but wanted
> > > to consult with you first. I am sure that you or Jeff can do a better
> > > job than me in enumerating the "interop" file lock issues that
> > > could be discussed in filesystems track forum.
> > > 
> > > Thoughts? Explanation why this idea is idiotic?
> > 
> > I think it's not a single idea. There are really two different aspects
> > to this given that we're really talking about two different types of
> > locks in SMB. I think you have to consider solving these problems
> > separately:
> > 
> > 1) the ability to set a (typically whole-file) share/deny lock
> > atomically when you open a file. This is necessary for coherent
> > share/deny lock semantics. Note that these are only enforced open()
> > time.
> > 
> > 2) mandatory locking (forbidding reads and writes on a byte range when
> > there is a conflicting lock set).
> > 
> 
> I was only trying to address the first problem (small steps...).
> 
> > The first could (probably) be solved with something like what Pavel
> > proposed a few years ago...or maybe we just wire up O_EXLOCK and
> > O_SHLOCK:
> > 
> >     https://www.gnu.org/software/libc/manual/html_node/Open_002dtime-Flags.html
> > 
> 
> Nice. I wasn't aware of those BSD flags.
> 

Share/deny open semantics are pretty similar across NFS and SMB (by
design, really). If you intend to solve that use-case, what you really
want is whole-file, shared/exclusive locks that are set atomically with
the open call. O_EXLOCK and O_SHLOCK seem like a reasonable fit there.

Then you could have SMB and NFS servers set these flags when opening
files, and deal with the occasional denial at open time. Other
applications won't be aware of them of course, but that's probably fine
for most use-cases where you want this sort of protocol interop.

DENY_DELETE is a bit harder to deal with however, but that's probably
something that could be addressed separately.

> > This seems like a fine idea (in principle) but it needs someone to drive
> > the work forward. You'll also likely be consuming a couple of O_* flags,
> > which could be tough sell (unless you come up with another way to do
> > it).
> > 
> 
> Once I know the obstacles to watch out from, I can drive this work.
> Thing is, I am not convinced myself that any new O_ flags are needed.
> 
> How about this (for samba, knfsd is simpler):
> - pfd = open(filename, O_PATH)
> - flock(pfd, LOCK_MAND) (for file servers interop)
> - vfs checks no conflicting LOCK_MAND locks (like patch you once posted)
> - open(filename, O_RDWR) (and verify st_ino like samba does)
> - Request lease (for local fs interop)
> - check_conflicting_open() is changed to use inode_is_open_for_read()gi
> - we already have i_readcount, just need to remove ifdef CONFIG_IMA
> - On lease break (from local fs), break client oplocks and invalidate
> file handle on server
> 

Now that I look at the handling of flock LOCK_MAND, I'm not sure how
it's supposed to work. In particular, flock_locks_conflict basically
says that a LOCK_MAND lock can never conflict with anything. I'm not
sure what good that does.

The flock manpage does not document LOCK_MAND. It's in /usr/include/asm-
generic/fcntl.h on my machine, but it looks like it just got taken right
out of the kernel headers long ago.

I think we need to have a hard look at what this flag is doing today
(seems like not much). What are samba's expectations with that flag?

-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Better interop for NFS/SMB file share mode/reservation
  2019-02-08 16:03     ` Jeff Layton
@ 2019-02-08 16:28       ` Jeffrey Layton
  0 siblings, 0 replies; 16+ messages in thread
From: Jeffrey Layton @ 2019-02-08 16:28 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: J. Bruce Fields, Volker.Lendecke, samba-technical,
	Linux NFS Mailing List, linux-fsdevel, Pavel Shilovsky

On Fri, 2019-02-08 at 11:03 -0500, Jeff Layton wrote:
> On Fri, 2019-02-08 at 16:45 +0200, Amir Goldstein wrote:
> > On Fri, Feb 8, 2019 at 3:10 PM Jeff Layton <jlayton@kernel.org> wrote:
> > > On Fri, 2019-02-08 at 13:20 +0200, Amir Goldstein wrote:
> > > > Hi Bruce,
> > > > 
> > > > I have been following you discussion with Volker Lendecke
> > > > on the samba technical mailing list [1] and have had discussed
> > > > this issue with Volker myself as well.
> > > > 
> > > > I decided to start this new thread to bring some kernel developers
> > > > in the loop and to propose an idea that takes a somewhat
> > > > different approach to the "interop" approaches I have seen
> > > > so far. "interop" in this context often means consistency of file
> > > > lock states between samba and nfs server, but I am referring
> > > > to the stronger sense of interop with local filesystem on the server.
> > > > 
> > > > You pointed to Pavel Shilovsky's O_DENY* patches [2] as a possible
> > > > solution to interop of NFS Share Reservation and SMB Share Mode
> > > > with local filesystems.
> > > > Some of the complaints on this approach were (rightfully) concerned
> > > > about DoS and the prospect of plaguing Linux with Windows server
> > > > "files left open" issues.
> > > > 
> > > > My idea comes from the observation that Windows server
> > > > administrators can release locked files that were left open by clients.
> > > > I suppose that an NFS server admin can do the same?
> > > 
> > > The Linux kernel has no mechanism for this (aside from sending a SIGKILL
> > > to lockd, which makes it drop all locks). Solaris did have a tool for
> > > this at one point (and probably still does).
> > > 
> > > It's a little less of a problem now than it used to be with NFS, given
> > > the move to NFSv4 (which has lease-based locking). If you have
> > > misbehaving clients, you just kick them out and their locks eventually
> > > go away. v3 locks can stick around in perpetuity however, so people have
> > > long wanted such a tool on Linux as well.
> > > 
> > 
> > In a nut shell, I think my proposal is that samba will do something
> > similar and request leases from the kernel instead of trying to
> > enforce real mandatory locks.
> > 
> > > > That realization makes "share access" locks (a.k.a. MAND_LOCK)
> > > > not so very different from oplocks (leases/delegations).
> > > > As long as samba and nfsd cooperate nicely with MAND_LOCK
> > > > semantics, we don't really have to force local filesystems
> > > > to obay MAND_LOCK semantics. If the file servers take leases
> > > > on local filesystems, they will not get exclusive write access for
> > > > files already open for write on local filesytem and same for read.
> > > > 
> > > 
> > > I think this last statement isn't correct (if I'm parsing it correctly).
> > > If a file is already open for write, then you just don't get a lease
> > > when you try to request one. Ditto for write leases if it's already open
> > > for read.
> > > 
> > 
> > I think you miss read what I miss wrote ;-)
> > As the title of this thread states, I am talking about the first case
> > of acquiring an exclusive or read shared access to file at open time.
> > It may be the fact that samba currently calls flock(LOCK_MAND)
> > that is the source for confusion.
> > 
> > Open failure is the expected behavior if file is already open for
> > write (or read) on local filesystem, so my suggestion is:
> > - Server opens the file and request a lease based of desired share mode
> > - If file server got the lease, client gets the file handle
> > - Otherwise, client gets an open failure
> > > > On local file access on the server that violates the share mode,
> > > > the file server acts as a grumpy washed out administrator that
> > > > automatically grants any lock revoke ticket after timeout.
> > > > 
> > > 
> > > Devil's advocate:
> > > 
> > > Is this situation any better than just teaching the NFS/SMB servers to
> > > track these locks out of band? Both samba and most NFS servers respect
> > > share/deny mode locks, but only internally -- they aren't aware of the
> > > others'. We could (in principle) come up with a mechanism to track these
> > > that doesn't involve plumbing them into the kernel.
> > > 
> > 
> > That would be a prerequisite to my suggested solution, as I wrote:
> > "As long as samba and nfsd cooperate nicely with LOCK_MAND..."
> > That means the two file servers cooperate on the share mode locks
> > and try to figure out if there are outstanding leases before opening
> > a file that will break those leases.
> > 
> > > That said, coherent locking is best done in the kernel, IMO...
> > > 
> > 
> > Indeed...
> > 
> > > > This model may not fit use cases where "real" interop with
> > > > local filesystem is needed, but compared to the existing
> > > > solution (no interop at all) it is quite an improvement.
> > > > 
> > > > Furthermore, short of SMB DENY_DELETE, we may not even
> > > > need to change any kernel APIs.
> > > > The addition of O_DENY* open flags can make programming
> > > > easier, but taking a lease on an open file is still safe enough
> > > > to implement share reservation (no?).
> > > > 
> > > > Satisfying DENY_DELETE could be more tricky, but perhaps
> > > > the existing SILLYRENAME interface of==between knfsd and vfs
> > > > could be somehow utilized for this purpose?
> > > > 
> > > > I though of bringing this up as a TOPIC for LSF/MM, but wanted
> > > > to consult with you first. I am sure that you or Jeff can do a better
> > > > job than me in enumerating the "interop" file lock issues that
> > > > could be discussed in filesystems track forum.
> > > > 
> > > > Thoughts? Explanation why this idea is idiotic?
> > > 
> > > I think it's not a single idea. There are really two different aspects
> > > to this given that we're really talking about two different types of
> > > locks in SMB. I think you have to consider solving these problems
> > > separately:
> > > 
> > > 1) the ability to set a (typically whole-file) share/deny lock
> > > atomically when you open a file. This is necessary for coherent
> > > share/deny lock semantics. Note that these are only enforced open()
> > > time.
> > > 
> > > 2) mandatory locking (forbidding reads and writes on a byte range when
> > > there is a conflicting lock set).
> > > 
> > 
> > I was only trying to address the first problem (small steps...).
> > 
> > > The first could (probably) be solved with something like what Pavel
> > > proposed a few years ago...or maybe we just wire up O_EXLOCK and
> > > O_SHLOCK:
> > > 
> > >     https://www.gnu.org/software/libc/manual/html_node/Open_002dtime-Flags.html
> > > 
> > 
> > Nice. I wasn't aware of those BSD flags.
> > 
> 
> Share/deny open semantics are pretty similar across NFS and SMB (by
> design, really). If you intend to solve that use-case, what you really
> want is whole-file, shared/exclusive locks that are set atomically with
> the open call. O_EXLOCK and O_SHLOCK seem like a reasonable fit there.
> 
> Then you could have SMB and NFS servers set these flags when opening
> files, and deal with the occasional denial at open time. Other
> applications won't be aware of them of course, but that's probably fine
> for most use-cases where you want this sort of protocol interop.
> 
> DENY_DELETE is a bit harder to deal with however, but that's probably
> something that could be addressed separately.
> 
> > > This seems like a fine idea (in principle) but it needs someone to drive
> > > the work forward. You'll also likely be consuming a couple of O_* flags,
> > > which could be tough sell (unless you come up with another way to do
> > > it).
> > > 
> > 
> > Once I know the obstacles to watch out from, I can drive this work.
> > Thing is, I am not convinced myself that any new O_ flags are needed.
> > 
> > How about this (for samba, knfsd is simpler):
> > - pfd = open(filename, O_PATH)
> > - flock(pfd, LOCK_MAND) (for file servers interop)
> > - vfs checks no conflicting LOCK_MAND locks (like patch you once posted)
> > - open(filename, O_RDWR) (and verify st_ino like samba does)
> > - Request lease (for local fs interop)
> > - check_conflicting_open() is changed to use inode_is_open_for_read()gi
> > - we already have i_readcount, just need to remove ifdef CONFIG_IMA
> > - On lease break (from local fs), break client oplocks and invalidate
> > file handle on server
> > 
> 
> Now that I look at the handling of flock LOCK_MAND, I'm not sure how
> it's supposed to work. In particular, flock_locks_conflict basically
> says that a LOCK_MAND lock can never conflict with anything. I'm not
> sure what good that does.
> 
> The flock manpage does not document LOCK_MAND. It's in /usr/include/asm-
> generic/fcntl.h on my machine, but it looks like it just got taken right
> out of the kernel headers long ago.
> 
> I think we need to have a hard look at what this flag is doing today
> (seems like not much). What are samba's expectations with that flag?
> 

Yeah, in fact, I rolled this program and ran it in two different shells
on the same machine against the same file, and they both acquired a
lock:

---------------------------[snip]------------------------------
#define _GNU_SOURCE
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/file.h>
#include <fcntl.h>

int main(int argc, char **argv) {
	int fd, ret;

	fd = open(argv[1], O_RDWR|O_CREAT, 0644);
	if (fd < 0)
		perror("open");

	ret = flock(fd, LOCK_EX|LOCK_MAND);
	if (ret)
		perror("flock");
	printf("Lock acquired");
	getchar();
	return 0;
}
---------------------------[snip]------------------------------

I move that LOCK_MAND be nuked from orbit...or someone step forward to
propose reasonable semantics for it. :)

-- 
Jeffrey Layton <jlayton@samba.org>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Better interop for NFS/SMB file share mode/reservation
  2019-02-08 15:50     ` J. Bruce Fields
@ 2019-02-08 20:02       ` Amir Goldstein
  2019-02-08 20:16         ` J. Bruce Fields
  2019-02-08 22:12         ` Jeremy Allison
  0 siblings, 2 replies; 16+ messages in thread
From: Amir Goldstein @ 2019-02-08 20:02 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, Volker.Lendecke, samba-technical,
	Linux NFS Mailing List, linux-fsdevel, Pavel Shilovsky

On Fri, Feb 8, 2019 at 5:51 PM J. Bruce Fields <bfields@fieldses.org> wrote:
>
> On Fri, Feb 08, 2019 at 04:45:46PM +0200, Amir Goldstein wrote:
> > - check_conflicting_open() is changed to use inode_is_open_for_read()
> >   instead of checking d_count and i_count.
>
> Independently of the rest, I'd love to do away with those
> d_count/i_count checks.  What's inode_is_open_for_read()?
>

It would look maybe something like this:

static inline bool file_is_open_for_read(const struct inode *file)
{
        struct inode *inode = file_inode(file);
        int countself = (file->f_mode & (FMODE_READ | FMODE_WRITE)) ==
FMODE_READ) ? 1 : 0;

        return atomic_read(&inode->i_readcount) > countself;
}

And it would allow for acquiring F_WRLCK lease if other
instances of inode are open O_PATH.
A slight change of semantics that seems harmless(?)
and will allow some flexibility.

But if samba can't figure out a way to keep a single open file
descriptor for oplocks per client-file, then this model doesn't
help us make any progress.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Better interop for NFS/SMB file share mode/reservation
  2019-02-08 20:02       ` Amir Goldstein
@ 2019-02-08 20:16         ` J. Bruce Fields
  2019-02-08 20:31           ` Amir Goldstein
  2019-02-08 22:12         ` Jeremy Allison
  1 sibling, 1 reply; 16+ messages in thread
From: J. Bruce Fields @ 2019-02-08 20:16 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jeff Layton, Volker.Lendecke, samba-technical,
	Linux NFS Mailing List, linux-fsdevel, Pavel Shilovsky

On Fri, Feb 08, 2019 at 10:02:43PM +0200, Amir Goldstein wrote:
> On Fri, Feb 8, 2019 at 5:51 PM J. Bruce Fields <bfields@fieldses.org> wrote:
> >
> > On Fri, Feb 08, 2019 at 04:45:46PM +0200, Amir Goldstein wrote:
> > > - check_conflicting_open() is changed to use inode_is_open_for_read()
> > >   instead of checking d_count and i_count.
> >
> > Independently of the rest, I'd love to do away with those
> > d_count/i_count checks.  What's inode_is_open_for_read()?
> >
> 
> It would look maybe something like this:
> 
> static inline bool file_is_open_for_read(const struct inode *file)
> {
>         struct inode *inode = file_inode(file);
>         int countself = (file->f_mode & (FMODE_READ | FMODE_WRITE)) ==
> FMODE_READ) ? 1 : 0;
> 
>         return atomic_read(&inode->i_readcount) > countself;
> }
> 
> And it would allow for acquiring F_WRLCK lease if other
> instances of inode are open O_PATH.
> A slight change of semantics that seems harmless(?)
> and will allow some flexibility.

How did I not know about i_readcount?  (Looking)  I guess it would mean
adding some dependence on CONFIG_IMA, hm.

--b.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Better interop for NFS/SMB file share mode/reservation
  2019-02-08 20:16         ` J. Bruce Fields
@ 2019-02-08 20:31           ` Amir Goldstein
  2019-02-14 20:51             ` J. Bruce Fields
  0 siblings, 1 reply; 16+ messages in thread
From: Amir Goldstein @ 2019-02-08 20:31 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, Volker.Lendecke, samba-technical,
	Linux NFS Mailing List, linux-fsdevel, Pavel Shilovsky

On Fri, Feb 8, 2019 at 10:17 PM J. Bruce Fields <bfields@fieldses.org> wrote:
>
> On Fri, Feb 08, 2019 at 10:02:43PM +0200, Amir Goldstein wrote:
> > On Fri, Feb 8, 2019 at 5:51 PM J. Bruce Fields <bfields@fieldses.org> wrote:
> > >
> > > On Fri, Feb 08, 2019 at 04:45:46PM +0200, Amir Goldstein wrote:
> > > > - check_conflicting_open() is changed to use inode_is_open_for_read()
> > > >   instead of checking d_count and i_count.
> > >
> > > Independently of the rest, I'd love to do away with those
> > > d_count/i_count checks.  What's inode_is_open_for_read()?
> > >
> >
> > It would look maybe something like this:
> >
> > static inline bool file_is_open_for_read(const struct inode *file)
> > {
> >         struct inode *inode = file_inode(file);
> >         int countself = (file->f_mode & (FMODE_READ | FMODE_WRITE)) ==
> > FMODE_READ) ? 1 : 0;
> >
> >         return atomic_read(&inode->i_readcount) > countself;
> > }
> >
> > And it would allow for acquiring F_WRLCK lease if other
> > instances of inode are open O_PATH.
> > A slight change of semantics that seems harmless(?)
> > and will allow some flexibility.
>
> How did I not know about i_readcount?  (Looking)  I guess it would mean
> adding some dependence on CONFIG_IMA, hm.
>

Yes, or we remove ifdef CONFIG_IMA from i_readcount.
I am not sure if the concern was size of struct inode
(shouldn't increase on 64bit arch) or the accounting on
open/close. The impact doesn't look significant (?)..

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Better interop for NFS/SMB file share mode/reservation
  2019-02-08 20:02       ` Amir Goldstein
  2019-02-08 20:16         ` J. Bruce Fields
@ 2019-02-08 22:12         ` Jeremy Allison
  2019-02-09  4:04           ` Amir Goldstein
  1 sibling, 1 reply; 16+ messages in thread
From: Jeremy Allison @ 2019-02-08 22:12 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: J. Bruce Fields, Linux NFS Mailing List, Volker.Lendecke,
	Jeff Layton, samba-technical, linux-fsdevel

On Fri, Feb 08, 2019 at 10:02:43PM +0200, Amir Goldstein via samba-technical wrote:
> On Fri, Feb 8, 2019 at 5:51 PM J. Bruce Fields <bfields@fieldses.org> wrote:
> >
> > On Fri, Feb 08, 2019 at 04:45:46PM +0200, Amir Goldstein wrote:
> > > - check_conflicting_open() is changed to use inode_is_open_for_read()
> > >   instead of checking d_count and i_count.
> >
> > Independently of the rest, I'd love to do away with those
> > d_count/i_count checks.  What's inode_is_open_for_read()?
> >
> 
> It would look maybe something like this:
> 
> static inline bool file_is_open_for_read(const struct inode *file)
> {
>         struct inode *inode = file_inode(file);
>         int countself = (file->f_mode & (FMODE_READ | FMODE_WRITE)) ==
> FMODE_READ) ? 1 : 0;
> 
>         return atomic_read(&inode->i_readcount) > countself;
> }
> 
> And it would allow for acquiring F_WRLCK lease if other
> instances of inode are open O_PATH.
> A slight change of semantics that seems harmless(?)
> and will allow some flexibility.
> 
> But if samba can't figure out a way to keep a single open file
> descriptor for oplocks per client-file, then this model doesn't
> help us make any progress.

Samba uses a single file descriptor per SMB2 open file
handle. Is this what you meant ? We need this to keep
the per-handle OFD locks around.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Better interop for NFS/SMB file share mode/reservation
  2019-02-08 22:12         ` Jeremy Allison
@ 2019-02-09  4:04           ` Amir Goldstein
  2019-02-14 21:06             ` J. Bruce Fields
  0 siblings, 1 reply; 16+ messages in thread
From: Amir Goldstein @ 2019-02-09  4:04 UTC (permalink / raw)
  To: Jeremy Allison
  Cc: J. Bruce Fields, Linux NFS Mailing List, Volker.Lendecke,
	Jeff Layton, samba-technical, linux-fsdevel

On Sat, Feb 9, 2019 at 12:12 AM Jeremy Allison <jra@samba.org> wrote:
>
> On Fri, Feb 08, 2019 at 10:02:43PM +0200, Amir Goldstein via samba-technical wrote:
> > On Fri, Feb 8, 2019 at 5:51 PM J. Bruce Fields <bfields@fieldses.org> wrote:
> > >
> > > On Fri, Feb 08, 2019 at 04:45:46PM +0200, Amir Goldstein wrote:
> > > > - check_conflicting_open() is changed to use inode_is_open_for_read()
> > > >   instead of checking d_count and i_count.
> > >
> > > Independently of the rest, I'd love to do away with those
> > > d_count/i_count checks.  What's inode_is_open_for_read()?
> > >
> >
> > It would look maybe something like this:
> >
> > static inline bool file_is_open_for_read(const struct inode *file)
> > {
> >         struct inode *inode = file_inode(file);
> >         int countself = (file->f_mode & (FMODE_READ | FMODE_WRITE)) ==
> > FMODE_READ) ? 1 : 0;
> >
> >         return atomic_read(&inode->i_readcount) > countself;
> > }
> >
> > And it would allow for acquiring F_WRLCK lease if other
> > instances of inode are open O_PATH.
> > A slight change of semantics that seems harmless(?)
> > and will allow some flexibility.
> >
> > But if samba can't figure out a way to keep a single open file
> > descriptor for oplocks per client-file, then this model doesn't
> > help us make any progress.
>
> Samba uses a single file descriptor per SMB2 open file
> handle. Is this what you meant ? We need this to keep
> the per-handle OFD locks around.

I understand now there are several cases when smbd has
several open file descriptors for the same client.
Is that related to this comment in samba wiki about kernel oplocks?

"Linux kernel oplocks don't provide the needed features.
(They don't even work correctly for oplocks...)"

Can you elaborate on that? is that because a samba oplock
is per client and therefore other file opens from same client
should not break its own lease?
If that is the case, than Bruce's work on the "delegations"
flavor of kernel oplocks could make them a good fit for samba.

As Bruce wrote, we could export the "delegations" flavor
to user space for samba, just after we figure out it matches
samba requirements.

Thanks,
Amir.

[1] https://wiki.samba.org/index.php/Samba3/SMB2#locking.2Fopen_files_.28fs_layer.29
[2] https://lore.kernel.org/lkml/1549656647-25115-1-git-send-email-bfields@redhat.com/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Better interop for NFS/SMB file share mode/reservation
  2019-02-08 14:45   ` Amir Goldstein
  2019-02-08 15:50     ` J. Bruce Fields
  2019-02-08 16:03     ` Jeff Layton
@ 2019-02-11  5:31     ` ronnie sahlberg
  2 siblings, 0 replies; 16+ messages in thread
From: ronnie sahlberg @ 2019-02-11  5:31 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jeff Layton, Linux NFS Mailing List, Volker Lendecke,
	samba-technical, linux-fsdevel

On Sat, Feb 9, 2019 at 12:47 AM Amir Goldstein via samba-technical
<samba-technical@lists.samba.org> wrote:
>
> On Fri, Feb 8, 2019 at 3:10 PM Jeff Layton <jlayton@kernel.org> wrote:
> >
> > On Fri, 2019-02-08 at 13:20 +0200, Amir Goldstein wrote:
> > > Hi Bruce,
> > >
> > > I have been following you discussion with Volker Lendecke
> > > on the samba technical mailing list [1] and have had discussed
> > > this issue with Volker myself as well.
> > >
> > > I decided to start this new thread to bring some kernel developers
> > > in the loop and to propose an idea that takes a somewhat
> > > different approach to the "interop" approaches I have seen
> > > so far. "interop" in this context often means consistency of file
> > > lock states between samba and nfs server, but I am referring
> > > to the stronger sense of interop with local filesystem on the server.
> > >
> > > You pointed to Pavel Shilovsky's O_DENY* patches [2] as a possible
> > > solution to interop of NFS Share Reservation and SMB Share Mode
> > > with local filesystems.
> > > Some of the complaints on this approach were (rightfully) concerned
> > > about DoS and the prospect of plaguing Linux with Windows server
> > > "files left open" issues.
> > >
> > > My idea comes from the observation that Windows server
> > > administrators can release locked files that were left open by clients.
> > > I suppose that an NFS server admin can do the same?
> >
> > The Linux kernel has no mechanism for this (aside from sending a SIGKILL
> > to lockd, which makes it drop all locks). Solaris did have a tool for
> > this at one point (and probably still does).
> >
> > It's a little less of a problem now than it used to be with NFS, given
> > the move to NFSv4 (which has lease-based locking). If you have
> > misbehaving clients, you just kick them out and their locks eventually
> > go away. v3 locks can stick around in perpetuity however, so people have
> > long wanted such a tool on Linux as well.
> >

Now, maybe NFSv3 is rapidly becoming obsolete so it might not be worth
spending time on.
We has such tools at EMC for our servers, NetApp had too. Customers
loved these tools especially since
locking and nfsv3 was always very fragile :-)

Many years ago I looked at linux nfs in order to see if I could write
such tools.
(the tools themselves are really simple  they just uses the NLM
protocol which the kernel already speaks so no need to add any new
apis.)
Takling from dim memory, but I think what tripped me up is that for
NLM_TEST reply, linux does not fill in the OH field, i.e. the owner
string,
which you would need to send a spoofed NLM_UNLOCK to release the lock.


>
> In a nut shell, I think my proposal is that samba will do something
> similar and request leases from the kernel instead of trying to
> enforce real mandatory locks.
>
> > > That realization makes "share access" locks (a.k.a. MAND_LOCK)
> > > not so very different from oplocks (leases/delegations).
> > > As long as samba and nfsd cooperate nicely with MAND_LOCK
> > > semantics, we don't really have to force local filesystems
> > > to obay MAND_LOCK semantics. If the file servers take leases
> > > on local filesystems, they will not get exclusive write access for
> > > files already open for write on local filesytem and same for read.
> > >
> >
> > I think this last statement isn't correct (if I'm parsing it correctly).
> > If a file is already open for write, then you just don't get a lease
> > when you try to request one. Ditto for write leases if it's already open
> > for read.
> >
>
> I think you miss read what I miss wrote ;-)
> As the title of this thread states, I am talking about the first case
> of acquiring an exclusive or read shared access to file at open time.
> It may be the fact that samba currently calls flock(LOCK_MAND)
> that is the source for confusion.
>
> Open failure is the expected behavior if file is already open for
> write (or read) on local filesystem, so my suggestion is:
> - Server opens the file and request a lease based of desired share mode
> - If file server got the lease, client gets the file handle
> - Otherwise, client gets an open failure
>
> > > On local file access on the server that violates the share mode,
> > > the file server acts as a grumpy washed out administrator that
> > > automatically grants any lock revoke ticket after timeout.
> > >
> >
> > Devil's advocate:
> >
> > Is this situation any better than just teaching the NFS/SMB servers to
> > track these locks out of band? Both samba and most NFS servers respect
> > share/deny mode locks, but only internally -- they aren't aware of the
> > others'. We could (in principle) come up with a mechanism to track these
> > that doesn't involve plumbing them into the kernel.
> >
>
> That would be a prerequisite to my suggested solution, as I wrote:
> "As long as samba and nfsd cooperate nicely with LOCK_MAND..."
> That means the two file servers cooperate on the share mode locks
> and try to figure out if there are outstanding leases before opening
> a file that will break those leases.
>
> > That said, coherent locking is best done in the kernel, IMO...
> >
>
> Indeed...
>
> > > This model may not fit use cases where "real" interop with
> > > local filesystem is needed, but compared to the existing
> > > solution (no interop at all) it is quite an improvement.
> > >
> > > Furthermore, short of SMB DENY_DELETE, we may not even
> > > need to change any kernel APIs.
> > > The addition of O_DENY* open flags can make programming
> > > easier, but taking a lease on an open file is still safe enough
> > > to implement share reservation (no?).
> > >
> > > Satisfying DENY_DELETE could be more tricky, but perhaps
> > > the existing SILLYRENAME interface of==between knfsd and vfs
> > > could be somehow utilized for this purpose?
> > >
> > > I though of bringing this up as a TOPIC for LSF/MM, but wanted
> > > to consult with you first. I am sure that you or Jeff can do a better
> > > job than me in enumerating the "interop" file lock issues that
> > > could be discussed in filesystems track forum.
> > >
> > > Thoughts? Explanation why this idea is idiotic?
> >
> > I think it's not a single idea. There are really two different aspects
> > to this given that we're really talking about two different types of
> > locks in SMB. I think you have to consider solving these problems
> > separately:
> >
> > 1) the ability to set a (typically whole-file) share/deny lock
> > atomically when you open a file. This is necessary for coherent
> > share/deny lock semantics. Note that these are only enforced open()
> > time.
> >
> > 2) mandatory locking (forbidding reads and writes on a byte range when
> > there is a conflicting lock set).
> >
>
> I was only trying to address the first problem (small steps...).
>
> > The first could (probably) be solved with something like what Pavel
> > proposed a few years ago...or maybe we just wire up O_EXLOCK and
> > O_SHLOCK:
> >
> >     https://www.gnu.org/software/libc/manual/html_node/Open_002dtime-Flags.html
> >
>
> Nice. I wasn't aware of those BSD flags.
>
> > This seems like a fine idea (in principle) but it needs someone to drive
> > the work forward. You'll also likely be consuming a couple of O_* flags,
> > which could be tough sell (unless you come up with another way to do
> > it).
> >
>
> Once I know the obstacles to watch out from, I can drive this work.
> Thing is, I am not convinced myself that any new O_ flags are needed.
>
> How about this (for samba, knfsd is simpler):
> - pfd = open(filename, O_PATH)
> - flock(pfd, LOCK_MAND) (for file servers interop)
> - vfs checks no conflicting LOCK_MAND locks (like patch you once posted)
> - open(filename, O_RDWR) (and verify st_ino like samba does)
> - Request lease (for local fs interop)
> - check_conflicting_open() is changed to use inode_is_open_for_read()
>   instead of checking d_count and i_count.
> - we already have i_readcount, just need to remove ifdef CONFIG_IMA
> - On lease break (from local fs), break client oplocks and invalidate
> file handle on server
>
> Thanks,
> Amir.
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Better interop for NFS/SMB file share mode/reservation
  2019-02-08 20:31           ` Amir Goldstein
@ 2019-02-14 20:51             ` J. Bruce Fields
  2019-02-15  7:31               ` Amir Goldstein
  0 siblings, 1 reply; 16+ messages in thread
From: J. Bruce Fields @ 2019-02-14 20:51 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jeff Layton, Volker.Lendecke, samba-technical,
	Linux NFS Mailing List, linux-fsdevel, Pavel Shilovsky

On Fri, Feb 08, 2019 at 10:31:07PM +0200, Amir Goldstein wrote:
> On Fri, Feb 8, 2019 at 10:17 PM J. Bruce Fields <bfields@fieldses.org> wrote:
> > On Fri, Feb 08, 2019 at 10:02:43PM +0200, Amir Goldstein wrote:
> > > On Fri, Feb 8, 2019 at 5:51 PM J. Bruce Fields <bfields@fieldses.org> wrote:
> > > > On Fri, Feb 08, 2019 at 04:45:46PM +0200, Amir Goldstein wrote:
> > > > > - check_conflicting_open() is changed to use inode_is_open_for_read()
> > > > >   instead of checking d_count and i_count.
> > > >
> > > > Independently of the rest, I'd love to do away with those
> > > > d_count/i_count checks.  What's inode_is_open_for_read()?
> > > >
> > >
> > > It would look maybe something like this:
> > >
> > > static inline bool file_is_open_for_read(const struct inode *file)
> > > {
> > >         struct inode *inode = file_inode(file);
> > >         int countself = (file->f_mode & (FMODE_READ | FMODE_WRITE)) ==
> > > FMODE_READ) ? 1 : 0;
> > >
> > >         return atomic_read(&inode->i_readcount) > countself;
> > > }
> > >
> > > And it would allow for acquiring F_WRLCK lease if other
> > > instances of inode are open O_PATH.
> > > A slight change of semantics that seems harmless(?)
> > > and will allow some flexibility.
> >
> > How did I not know about i_readcount?  (Looking)  I guess it would mean
> > adding some dependence on CONFIG_IMA, hm.
> >
> 
> Yes, or we remove ifdef CONFIG_IMA from i_readcount.
> I am not sure if the concern was size of struct inode
> (shouldn't increase on 64bit arch) or the accounting on
> open/close. The impact doesn't look significant (?)..

Looks like the original patch was d984ea604943bb "fs: move i_readcount".
I did some googling around and looked at the discussion summarized by
https://lwn.net/Articles/410895/ but can't find useful discussion of
i_readcount impact.

Looks like CONFIG_IMA is on in Fedora and RHEL, for what it's worth.

Maybe something like this?

--b.

commit 02cfda99ed8c
Author: J. Bruce Fields <bfields@redhat.com>
Date:   Thu Feb 14 15:02:02 2019 -0500

    locks: use i_readcount to detect lease conflicts
    
    The lease code currently uses the inode and dentry refcounts to detect
    whether someone has a file open for read.  This seems fragile.  Use
    i_readcount instead.
    
    Signed-off-by: J. Bruce Fields <bfields@redhat.com>

diff --git a/fs/locks.c b/fs/locks.c
index ff6af2c32601..299abad65545 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1769,8 +1769,7 @@ check_conflicting_open(const struct dentry *dentry, const long arg, int flags)
 	if ((arg == F_RDLCK) && inode_is_open_for_write(inode))
 		return -EAGAIN;
 
-	if ((arg == F_WRLCK) && ((d_count(dentry) > 1) ||
-	    (atomic_read(&inode->i_count) > 1)))
+	if ((arg == F_WRLCK) && (atomic_read(&inode->i_readcount) > 1))
 		ret = -EAGAIN;
 
 	return ret;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 29d8e2cfed0e..e862de682da9 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -676,7 +676,7 @@ struct inode {
 	atomic_t		i_count;
 	atomic_t		i_dio_count;
 	atomic_t		i_writecount;
-#ifdef CONFIG_IMA
+#if defined(CONFIG_IMA) || (defined_CONFIG_FILE_LOCKING)
 	atomic_t		i_readcount; /* struct files open RO */
 #endif
 	const struct file_operations	*i_fop;	/* former ->i_op->default_file_ops */
@@ -2869,7 +2869,7 @@ static inline bool inode_is_open_for_write(const struct inode *inode)
 	return atomic_read(&inode->i_writecount) > 0;
 }
 
-#ifdef CONFIG_IMA
+#if defined(CONFIG_IMA) || defined(CONFIG_FILE_LOCKING)
 static inline void i_readcount_dec(struct inode *inode)
 {
 	BUG_ON(!atomic_read(&inode->i_readcount));

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Better interop for NFS/SMB file share mode/reservation
  2019-02-09  4:04           ` Amir Goldstein
@ 2019-02-14 21:06             ` J. Bruce Fields
  0 siblings, 0 replies; 16+ messages in thread
From: J. Bruce Fields @ 2019-02-14 21:06 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jeremy Allison, Linux NFS Mailing List, Volker.Lendecke,
	Jeff Layton, samba-technical, linux-fsdevel

On Sat, Feb 09, 2019 at 06:04:22AM +0200, Amir Goldstein wrote:
> On Sat, Feb 9, 2019 at 12:12 AM Jeremy Allison <jra@samba.org> wrote:
> >
> > On Fri, Feb 08, 2019 at 10:02:43PM +0200, Amir Goldstein via samba-technical wrote:
> > > On Fri, Feb 8, 2019 at 5:51 PM J. Bruce Fields <bfields@fieldses.org> wrote:
> > > >
> > > > On Fri, Feb 08, 2019 at 04:45:46PM +0200, Amir Goldstein wrote:
> > > > > - check_conflicting_open() is changed to use inode_is_open_for_read()
> > > > >   instead of checking d_count and i_count.
> > > >
> > > > Independently of the rest, I'd love to do away with those
> > > > d_count/i_count checks.  What's inode_is_open_for_read()?
> > > >
> > >
> > > It would look maybe something like this:
> > >
> > > static inline bool file_is_open_for_read(const struct inode *file)
> > > {
> > >         struct inode *inode = file_inode(file);
> > >         int countself = (file->f_mode & (FMODE_READ | FMODE_WRITE)) ==
> > > FMODE_READ) ? 1 : 0;
> > >
> > >         return atomic_read(&inode->i_readcount) > countself;
> > > }
> > >
> > > And it would allow for acquiring F_WRLCK lease if other
> > > instances of inode are open O_PATH.
> > > A slight change of semantics that seems harmless(?)
> > > and will allow some flexibility.
> > >
> > > But if samba can't figure out a way to keep a single open file
> > > descriptor for oplocks per client-file, then this model doesn't
> > > help us make any progress.
> >
> > Samba uses a single file descriptor per SMB2 open file
> > handle. Is this what you meant ? We need this to keep
> > the per-handle OFD locks around.
> 
> I understand now there are several cases when smbd has
> several open file descriptors for the same client.
> Is that related to this comment in samba wiki about kernel oplocks?
> 
> "Linux kernel oplocks don't provide the needed features.
> (They don't even work correctly for oplocks...)"
> 
> Can you elaborate on that? is that because a samba oplock
> is per client and therefore other file opens from same client
> should not break its own lease?
> If that is the case, than Bruce's work on the "delegations"
> flavor of kernel oplocks could make them a good fit for samba.

After this:

	https://marc.info/?l=linux-nfs&m=154966239918297&w=2

delegations would no longer conflict with opens from the same tgid.  So
if your threads all run in the same process and you're willing to manage
conflicts among your own clients, that should still allow you to do
multiple opens of the same file without giving up your lease/delegation.

I'd be curious to know whether that works with Samba's design.

--b.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Better interop for NFS/SMB file share mode/reservation
  2019-02-14 20:51             ` J. Bruce Fields
@ 2019-02-15  7:31               ` Amir Goldstein
  2019-02-15 20:09                 ` J. Bruce Fields
  0 siblings, 1 reply; 16+ messages in thread
From: Amir Goldstein @ 2019-02-15  7:31 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, Volker.Lendecke, samba-technical,
	Linux NFS Mailing List, linux-fsdevel, Pavel Shilovsky

On Thu, Feb 14, 2019 at 10:51 PM J. Bruce Fields <bfields@fieldses.org> wrote:
>
> On Fri, Feb 08, 2019 at 10:31:07PM +0200, Amir Goldstein wrote:
> > On Fri, Feb 8, 2019 at 10:17 PM J. Bruce Fields <bfields@fieldses.org> wrote:
> > > On Fri, Feb 08, 2019 at 10:02:43PM +0200, Amir Goldstein wrote:
> > > > On Fri, Feb 8, 2019 at 5:51 PM J. Bruce Fields <bfields@fieldses.org> wrote:
> > > > > On Fri, Feb 08, 2019 at 04:45:46PM +0200, Amir Goldstein wrote:
> > > > > > - check_conflicting_open() is changed to use inode_is_open_for_read()
> > > > > >   instead of checking d_count and i_count.
> > > > >
> > > > > Independently of the rest, I'd love to do away with those
> > > > > d_count/i_count checks.  What's inode_is_open_for_read()?
> > > > >
> > > >
> > > > It would look maybe something like this:
> > > >
> > > > static inline bool file_is_open_for_read(const struct inode *file)
> > > > {
> > > >         struct inode *inode = file_inode(file);
> > > >         int countself = (file->f_mode & (FMODE_READ | FMODE_WRITE)) ==
> > > > FMODE_READ) ? 1 : 0;
> > > >
> > > >         return atomic_read(&inode->i_readcount) > countself;
> > > > }
> > > >
> > > > And it would allow for acquiring F_WRLCK lease if other
> > > > instances of inode are open O_PATH.
> > > > A slight change of semantics that seems harmless(?)
> > > > and will allow some flexibility.
> > >
> > > How did I not know about i_readcount?  (Looking)  I guess it would mean
> > > adding some dependence on CONFIG_IMA, hm.
> > >
> >
> > Yes, or we remove ifdef CONFIG_IMA from i_readcount.
> > I am not sure if the concern was size of struct inode
> > (shouldn't increase on 64bit arch) or the accounting on
> > open/close. The impact doesn't look significant (?)..
>
> Looks like the original patch was d984ea604943bb "fs: move i_readcount".
> I did some googling around and looked at the discussion summarized by
> https://lwn.net/Articles/410895/ but can't find useful discussion of
> i_readcount impact.
>
> Looks like CONFIG_IMA is on in Fedora and RHEL, for what it's worth.
>
> Maybe something like this?
>
> --b.
>
> commit 02cfda99ed8c
> Author: J. Bruce Fields <bfields@redhat.com>
> Date:   Thu Feb 14 15:02:02 2019 -0500
>
>     locks: use i_readcount to detect lease conflicts
>
>     The lease code currently uses the inode and dentry refcounts to detect
>     whether someone has a file open for read.  This seems fragile.  Use
>     i_readcount instead.
>
>     Signed-off-by: J. Bruce Fields <bfields@redhat.com>
>
> diff --git a/fs/locks.c b/fs/locks.c
> index ff6af2c32601..299abad65545 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -1769,8 +1769,7 @@ check_conflicting_open(const struct dentry *dentry, const long arg, int flags)
>         if ((arg == F_RDLCK) && inode_is_open_for_write(inode))
>                 return -EAGAIN;
>
> -       if ((arg == F_WRLCK) && ((d_count(dentry) > 1) ||
> -           (atomic_read(&inode->i_count) > 1)))
> +       if ((arg == F_WRLCK) && (atomic_read(&inode->i_readcount) > 1))
>                 ret = -EAGAIN;

Alas, i_readcount is not the count of file opens for read, it is the count
of file opens O_RDONLY, so this is incorrect wrt conflict with other writers.

I guess since there is a full smp_mb() before this check, then you
can check (i_readcount + i_writecount) > 1 || (i_writecount < 0)

You can also check if caller itself is O_RDONLY to know if self
count is expect to be in i_readcount or i_writecount, but not sure
it is worth the trouble.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Better interop for NFS/SMB file share mode/reservation
  2019-02-15  7:31               ` Amir Goldstein
@ 2019-02-15 20:09                 ` J. Bruce Fields
  0 siblings, 0 replies; 16+ messages in thread
From: J. Bruce Fields @ 2019-02-15 20:09 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jeff Layton, Volker.Lendecke, samba-technical,
	Linux NFS Mailing List, linux-fsdevel, Pavel Shilovsky

On Fri, Feb 15, 2019 at 09:31:48AM +0200, Amir Goldstein wrote:
> On Thu, Feb 14, 2019 at 10:51 PM J. Bruce Fields <bfields@fieldses.org> wrote:
> >
> > On Fri, Feb 08, 2019 at 10:31:07PM +0200, Amir Goldstein wrote:
> > > On Fri, Feb 8, 2019 at 10:17 PM J. Bruce Fields <bfields@fieldses.org> wrote:
> > > > On Fri, Feb 08, 2019 at 10:02:43PM +0200, Amir Goldstein wrote:
> > > > > On Fri, Feb 8, 2019 at 5:51 PM J. Bruce Fields <bfields@fieldses.org> wrote:
> > > > > > On Fri, Feb 08, 2019 at 04:45:46PM +0200, Amir Goldstein wrote:
> > > > > > > - check_conflicting_open() is changed to use inode_is_open_for_read()
> > > > > > >   instead of checking d_count and i_count.
> > > > > >
> > > > > > Independently of the rest, I'd love to do away with those
> > > > > > d_count/i_count checks.  What's inode_is_open_for_read()?
> > > > > >
> > > > >
> > > > > It would look maybe something like this:
> > > > >
> > > > > static inline bool file_is_open_for_read(const struct inode *file)
> > > > > {
> > > > >         struct inode *inode = file_inode(file);
> > > > >         int countself = (file->f_mode & (FMODE_READ | FMODE_WRITE)) ==
> > > > > FMODE_READ) ? 1 : 0;
> > > > >
> > > > >         return atomic_read(&inode->i_readcount) > countself;
> > > > > }
> > > > >
> > > > > And it would allow for acquiring F_WRLCK lease if other
> > > > > instances of inode are open O_PATH.
> > > > > A slight change of semantics that seems harmless(?)
> > > > > and will allow some flexibility.
> > > >
> > > > How did I not know about i_readcount?  (Looking)  I guess it would mean
> > > > adding some dependence on CONFIG_IMA, hm.
> > > >
> > >
> > > Yes, or we remove ifdef CONFIG_IMA from i_readcount.
> > > I am not sure if the concern was size of struct inode
> > > (shouldn't increase on 64bit arch) or the accounting on
> > > open/close. The impact doesn't look significant (?)..
> >
> > Looks like the original patch was d984ea604943bb "fs: move i_readcount".
> > I did some googling around and looked at the discussion summarized by
> > https://lwn.net/Articles/410895/ but can't find useful discussion of
> > i_readcount impact.
> >
> > Looks like CONFIG_IMA is on in Fedora and RHEL, for what it's worth.
> >
> > Maybe something like this?
> >
> > --b.
> >
> > commit 02cfda99ed8c
> > Author: J. Bruce Fields <bfields@redhat.com>
> > Date:   Thu Feb 14 15:02:02 2019 -0500
> >
> >     locks: use i_readcount to detect lease conflicts
> >
> >     The lease code currently uses the inode and dentry refcounts to detect
> >     whether someone has a file open for read.  This seems fragile.  Use
> >     i_readcount instead.
> >
> >     Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> >
> > diff --git a/fs/locks.c b/fs/locks.c
> > index ff6af2c32601..299abad65545 100644
> > --- a/fs/locks.c
> > +++ b/fs/locks.c
> > @@ -1769,8 +1769,7 @@ check_conflicting_open(const struct dentry *dentry, const long arg, int flags)
> >         if ((arg == F_RDLCK) && inode_is_open_for_write(inode))
> >                 return -EAGAIN;
> >
> > -       if ((arg == F_WRLCK) && ((d_count(dentry) > 1) ||
> > -           (atomic_read(&inode->i_count) > 1)))
> > +       if ((arg == F_WRLCK) && (atomic_read(&inode->i_readcount) > 1))
> >                 ret = -EAGAIN;
> 
> Alas, i_readcount is not the count of file opens for read, it is the count
> of file opens O_RDONLY, so this is incorrect wrt conflict with other writers.

Whoops, thanks!

> I guess since there is a full smp_mb() before this check, then you
> can check (i_readcount + i_writecount) > 1 || (i_writecount < 0)
> 
> You can also check if caller itself is O_RDONLY to know if self
> count is expect to be in i_readcount or i_writecount, but not sure
> it is worth the trouble.

I don't know, it still looks reasonable.  I'll fool around with it.

--b.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, back to index

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-08 11:20 Better interop for NFS/SMB file share mode/reservation Amir Goldstein
2019-02-08 13:10 ` Jeff Layton
2019-02-08 14:45   ` Amir Goldstein
2019-02-08 15:50     ` J. Bruce Fields
2019-02-08 20:02       ` Amir Goldstein
2019-02-08 20:16         ` J. Bruce Fields
2019-02-08 20:31           ` Amir Goldstein
2019-02-14 20:51             ` J. Bruce Fields
2019-02-15  7:31               ` Amir Goldstein
2019-02-15 20:09                 ` J. Bruce Fields
2019-02-08 22:12         ` Jeremy Allison
2019-02-09  4:04           ` Amir Goldstein
2019-02-14 21:06             ` J. Bruce Fields
2019-02-08 16:03     ` Jeff Layton
2019-02-08 16:28       ` Jeffrey Layton
2019-02-11  5:31     ` ronnie sahlberg

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org linux-fsdevel@archiver.kernel.org
	public-inbox-index linux-fsdevel


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox