[GIT PULL] please pull file-locking related changes for v3.20

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [GIT PULL] please pull file-locking related changes for v3.20
@ 2015-02-09 10:55 Jeff Layton
  2015-02-16 13:32 ` Kirill A. Shutemov
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff Layton @ 2015-02-09 10:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-fsdevel, linux-kernel, J. Bruce Fields, Christoph Hellwig,
	Dave Chinner

The following changes since commit cb59670870d90ff8bc31f5f2efc407c6fe4938c0:

  Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse (2015-01-16 14:58:16 +1300)

are available in the git repository at:

  git://git.samba.org/jlayton/linux.git tags/locks-v3.20-1

for you to fetch changes up to 8116bf4cb62d337c953cfa5369ef4cf83e73140c:

  locks: update comments that refer to inode->i_flock (2015-01-21 20:44:01 -0500)

----------------------------------------------------------------
File locking related changes for v3.20 (pile #1)

This patchset contains a fairly major overhaul of how file locks are
tracked within the inode. Rather than a single list, we now create a
per-inode "lock context" that contains individual lists for the file
locks, and a new dedicated spinlock for them.

There are changes in other trees that are based on top of this set so
it may be easiest to pull this in early.
----------------------------------------------------------------
Jeff Layton (13):
      locks: add new struct list_head to struct file_lock
      locks: have locks_release_file use flock_lock_file to release generic flock locks
      locks: add a new struct file_locking_context pointer to struct inode
      ceph: move spinlocking into ceph_encode_locks_to_buffer and ceph_count_locks
      locks: move flock locks to file_lock_context
      locks: convert posix locks to file_lock_context
      locks: convert lease handling to file_lock_context
      locks: remove i_flock field from struct inode
      locks: add a dedicated spinlock to protect i_flctx lists
      locks: clean up the lm_change prototype
      locks: keep a count of locks on the flctx lists
      locks: consolidate NULL i_flctx checks in locks_remove_file
      locks: update comments that refer to inode->i_flock

 fs/ceph/locks.c      |  64 ++++----
 fs/ceph/mds_client.c |   4 -
 fs/cifs/file.c       |  34 ++---
 fs/inode.c           |   3 +-
 fs/lockd/svcsubs.c   |  26 ++--
 fs/locks.c           | 569 +++++++++++++++++++++++++++++++++++++-------------------------------
 fs/nfs/delegation.c  |  23 +--
 fs/nfs/nfs4state.c   |  70 +++++----
 fs/nfs/pagelist.c    |   6 +-
 fs/nfs/write.c       |  41 ++++-
 fs/nfsd/nfs4state.c  |  21 ++-
 fs/read_write.c      |   2 +-
 include/linux/fs.h   |  52 +++++--
 13 files changed, 510 insertions(+), 405 deletions(-)

-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT PULL] please pull file-locking related changes for v3.20
  2015-02-09 10:55 [GIT PULL] please pull file-locking related changes for v3.20 Jeff Layton
@ 2015-02-16 13:32 ` Kirill A. Shutemov
  2015-02-16 14:00   ` Jeff Layton
  0 siblings, 1 reply; 20+ messages in thread
From: Kirill A. Shutemov @ 2015-02-16 13:32 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Linus Torvalds, linux-fsdevel, linux-kernel, J. Bruce Fields,
	Christoph Hellwig, Dave Chinner

On Mon, Feb 09, 2015 at 05:55:40AM -0500, Jeff Layton wrote:
> The following changes since commit cb59670870d90ff8bc31f5f2efc407c6fe4938c0:
> 
>   Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse (2015-01-16 14:58:16 +1300)
> 
> are available in the git repository at:
> 
>   git://git.samba.org/jlayton/linux.git tags/locks-v3.20-1
> 
> for you to fetch changes up to 8116bf4cb62d337c953cfa5369ef4cf83e73140c:
> 
>   locks: update comments that refer to inode->i_flock (2015-01-21 20:44:01 -0500)
> 
> ----------------------------------------------------------------
> File locking related changes for v3.20 (pile #1)
> 
> This patchset contains a fairly major overhaul of how file locks are
> tracked within the inode. Rather than a single list, we now create a
> per-inode "lock context" that contains individual lists for the file
> locks, and a new dedicated spinlock for them.
> 
> There are changes in other trees that are based on top of this set so
> it may be easiest to pull this in early.

The warning below is triggered on exit from trinity by ctrl-c. I saw it
few times.

[  733.480323] ------------[ cut here ]------------
[  733.480985] WARNING: CPU: 1 PID: 24375 at /home/kas/git/public/linux-next/fs/locks.c:243 locks_free_lock_context+0x6a/0xd0()
[  733.482393] Modules linked in:
[  733.482807] CPU: 1 PID: 24375 Comm: trinity-main Not tainted 3.19.0-next-20150212-00024-g8d751e144a78-dirty #641
[  733.484108] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
[  733.485713]  ffffffff81c9c150 ffff88084b88bc88 ffffffff81883d17 0000000000000007
[  733.486708]  0000000000000000 ffff88084b88bcc8 ffffffff81054eca ffff88007a31dc30
[  733.487701]  ffff88007060fda0 ffff88007a31dc30 ffffffff81a27940 ffff88007a31dc30
[  733.488695] Call Trace:
[  733.489017]  [<ffffffff81883d17>] dump_stack+0x4f/0x7b
[  733.489678]  [<ffffffff81054eca>] warn_slowpath_common+0x8a/0xc0
[  733.490431]  [<ffffffff81054fba>] warn_slowpath_null+0x1a/0x20
[  733.491165]  [<ffffffff8122d74a>] locks_free_lock_context+0x6a/0xd0
[  733.491959]  [<ffffffff811f1ba2>] __destroy_inode+0x32/0xe0
[  733.492715]  [<ffffffff811f3006>] destroy_inode+0x26/0x70
[  733.493394]  [<ffffffff811f3162>] evict+0x112/0x1a0
[  733.494026]  [<ffffffff811f3d1e>] iput+0x2be/0x3d0
[  733.494635]  [<ffffffff811ee480>] __dentry_kill+0x190/0x200
[  733.495341]  [<ffffffff811ee88b>] dput+0x39b/0x3d0
[  733.495946]  [<ffffffff811ee519>] ? dput+0x29/0x3d0
[  733.496581]  [<ffffffff811d618c>] __fput+0x14c/0x220
[  733.497218]  [<ffffffff811d62ae>] ____fput+0xe/0x10
[  733.497847]  [<ffffffff810785e4>] task_work_run+0xb4/0xe0
[  733.498532]  [<ffffffff810586ce>] do_exit+0x36e/0xd60
[  733.499175]  [<ffffffff8188e5d1>] ? retint_swapgs+0xe/0x44
[  733.499868]  [<ffffffff8143ce13>] ? __this_cpu_preempt_check+0x13/0x20
[  733.500695]  [<ffffffff81059164>] do_group_exit+0x54/0xe0
[  733.501388]  [<ffffffff81059204>] SyS_exit_group+0x14/0x20
[  733.502080]  [<ffffffff8188da12>] system_call_fastpath+0x12/0x17
[  733.502874] ---[ end trace bcc1a9752062721f ]---

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT PULL] please pull file-locking related changes for v3.20
  2015-02-16 13:32 ` Kirill A. Shutemov
@ 2015-02-16 14:00   ` Jeff Layton
  2015-02-16 18:46     ` Linus Torvalds
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff Layton @ 2015-02-16 14:00 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Linus Torvalds, linux-fsdevel, linux-kernel, J. Bruce Fields,
	Christoph Hellwig, Dave Chinner, Sasha Levin

On Mon, 16 Feb 2015 15:32:00 +0200
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Mon, Feb 09, 2015 at 05:55:40AM -0500, Jeff Layton wrote:
> > The following changes since commit cb59670870d90ff8bc31f5f2efc407c6fe4938c0:
> > 
> >   Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse (2015-01-16 14:58:16 +1300)
> > 
> > are available in the git repository at:
> > 
> >   git://git.samba.org/jlayton/linux.git tags/locks-v3.20-1
> > 
> > for you to fetch changes up to 8116bf4cb62d337c953cfa5369ef4cf83e73140c:
> > 
> >   locks: update comments that refer to inode->i_flock (2015-01-21 20:44:01 -0500)
> > 
> > ----------------------------------------------------------------
> > File locking related changes for v3.20 (pile #1)
> > 
> > This patchset contains a fairly major overhaul of how file locks are
> > tracked within the inode. Rather than a single list, we now create a
> > per-inode "lock context" that contains individual lists for the file
> > locks, and a new dedicated spinlock for them.
> > 
> > There are changes in other trees that are based on top of this set so
> > it may be easiest to pull this in early.
> 
> The warning below is triggered on exit from trinity by ctrl-c. I saw it
> few times.
> 
> [  733.480323] ------------[ cut here ]------------
> [  733.480985] WARNING: CPU: 1 PID: 24375 at /home/kas/git/public/linux-next/fs/locks.c:243 locks_free_lock_context+0x6a/0xd0()
> [  733.482393] Modules linked in:
> [  733.482807] CPU: 1 PID: 24375 Comm: trinity-main Not tainted 3.19.0-next-20150212-00024-g8d751e144a78-dirty #641
> [  733.484108] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
> [  733.485713]  ffffffff81c9c150 ffff88084b88bc88 ffffffff81883d17 0000000000000007
> [  733.486708]  0000000000000000 ffff88084b88bcc8 ffffffff81054eca ffff88007a31dc30
> [  733.487701]  ffff88007060fda0 ffff88007a31dc30 ffffffff81a27940 ffff88007a31dc30
> [  733.488695] Call Trace:
> [  733.489017]  [<ffffffff81883d17>] dump_stack+0x4f/0x7b
> [  733.489678]  [<ffffffff81054eca>] warn_slowpath_common+0x8a/0xc0
> [  733.490431]  [<ffffffff81054fba>] warn_slowpath_null+0x1a/0x20
> [  733.491165]  [<ffffffff8122d74a>] locks_free_lock_context+0x6a/0xd0
> [  733.491959]  [<ffffffff811f1ba2>] __destroy_inode+0x32/0xe0
> [  733.492715]  [<ffffffff811f3006>] destroy_inode+0x26/0x70
> [  733.493394]  [<ffffffff811f3162>] evict+0x112/0x1a0
> [  733.494026]  [<ffffffff811f3d1e>] iput+0x2be/0x3d0
> [  733.494635]  [<ffffffff811ee480>] __dentry_kill+0x190/0x200
> [  733.495341]  [<ffffffff811ee88b>] dput+0x39b/0x3d0
> [  733.495946]  [<ffffffff811ee519>] ? dput+0x29/0x3d0
> [  733.496581]  [<ffffffff811d618c>] __fput+0x14c/0x220
> [  733.497218]  [<ffffffff811d62ae>] ____fput+0xe/0x10
> [  733.497847]  [<ffffffff810785e4>] task_work_run+0xb4/0xe0
> [  733.498532]  [<ffffffff810586ce>] do_exit+0x36e/0xd60
> [  733.499175]  [<ffffffff8188e5d1>] ? retint_swapgs+0xe/0x44
> [  733.499868]  [<ffffffff8143ce13>] ? __this_cpu_preempt_check+0x13/0x20
> [  733.500695]  [<ffffffff81059164>] do_group_exit+0x54/0xe0
> [  733.501388]  [<ffffffff81059204>] SyS_exit_group+0x14/0x20
> [  733.502080]  [<ffffffff8188da12>] system_call_fastpath+0x12/0x17
> [  733.502874] ---[ end trace bcc1a9752062721f ]---
> 

Thanks Kirill.

This looks similar to the problem that Sasha Levin reported when I had
this in linux-next. Basically, we're tearing down the i_flctx when
tearing down the inode, but the i_flock list isn't empty.

Most likely what happened is that we had a flock lock race onto the
inode around the same time that locks_remove_file was called for the
file.  Moving the setting of i_flctx under the i_lock seemed to resolve
it before for Sasha, but I think that this shows that that's not 100%
sufficient.

So I suspect we have either a cache-coherency problem where the cpu
running locks_remove_file doesn't realize that the i_flctx was recently
set, or there's something wrong with the fget/fput machinery. Probably
the former.

I'll look at it  again and also see if I can reproduce it with trinity.
So far, I haven't been able to, but maybe I'll get lucky this time.

Thanks,
-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT PULL] please pull file-locking related changes for v3.20
  2015-02-16 14:00   ` Jeff Layton
@ 2015-02-16 18:46     ` Linus Torvalds
  2015-02-16 19:24       ` Linus Torvalds
  0 siblings, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2015-02-16 18:46 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Kirill A. Shutemov, linux-fsdevel, Linux Kernel Mailing List,
	J. Bruce Fields, Christoph Hellwig, Dave Chinner, Sasha Levin

On Mon, Feb 16, 2015 at 6:00 AM, Jeff Layton <jlayton@poochiereds.net> wrote:
>
> I'll look at it  again and also see if I can reproduce it with trinity.
> So far, I haven't been able to, but maybe I'll get lucky this time.

Can you please also remove the completely broken counters?

The f*ckers aren't even initialized, they are never used outside of
completely broken cifs code (hint: it doesn't take the spinlock, so it
just uses the counters as a random-number-generator), and even there
it would be better to just count the list instead of maintaining a
count of list entries - and doing it *wrong*.

The lock counters are broken. Get rid of them. Seriously. The reason I
care is that I tried to read the code to manage the locks in
fs/locks.c, and just passing the pointer to the counter around made it
unreadable. And when I actually tried to read it, and look at the
initialization, I see that they are never initialized.

This code is so broken that my initial reaction is "We need to just
revert the crap".

And no. The fix is *not* to just initialize those stupid things to
zero. The fix is to remove them.

                           Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT PULL] please pull file-locking related changes for v3.20
  2015-02-16 18:46     ` Linus Torvalds
@ 2015-02-16 19:24       ` Linus Torvalds
  2015-02-16 19:59         ` Jeff Layton
                           ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Linus Torvalds @ 2015-02-16 19:24 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Kirill A. Shutemov, linux-fsdevel, Linux Kernel Mailing List,
	J. Bruce Fields, Christoph Hellwig, Dave Chinner, Sasha Levin

On Mon, Feb 16, 2015 at 10:46 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> This code is so broken that my initial reaction is "We need to just
> revert the crap".

How the hell is flock_lock_file() supposed to work at all, btw?

Say we have an existing flock, and now do a new one that conflicts. I
see what looks like three separate bugs.

 - We go through the first loop, find a lock of another type, and
delete it in preparation for replacing it

 - we *drop* the lock context spinlock.

 - BUG #1? So now there is no lock at all, and somebody can come in
and see that unlocked state. Is that really valid?

 - another thread comes in while the first thread dropped the lock
context lock, and wants to add its own lock. It doesn't see the
deleted or pending locks, so it just adds it

 - the first thread gets the context spinlock again, and adds the lock
that replaced the original

 - BUG #2? So now there are *two* locks on the thing, and the next
time you do an unlock (or when you close the file), it will only
remove/replace the first one.

Both of those bugs are due to the whole "drop the lock in the middle",
which is pretty much always a mistake.  BUG#2 could easily explain the
warning Kirill reports, afaik.

BUG#3 seems to be independent, and is about somebody replacing an
existing lock, but the new lock conflicts. Again, the first loop will
remove the old lock, and then the second loop will see the conflict,
and return an error (and we may then end up waiting for it for the
FILE_LOCK_DEFERRED case). Now the original lock is gone. Is that
really right? That sounds bogus. *Failing* to insert a flock causing
the old flock to go away?

Now, flock semantics are pretty much insane, so maybe all these bugs
except for #2 aren't actually bugs, and are "features" of flock. But
bug #2 can't be a semantic feature.

Is there something I'm missing here?

This was all just looking at a *single* function. Quite frankly, I
hate how the code also just does

        if (filp->f_op->flock)
                filp->f_op->flock(filp, F_SETLKW, &fl);
        else
                flock_lock_file(filp, &fl);

and blithely assumes that some random filesystem will get the flock
semantics right, when even the code code screwed it up this badly.

And maybe I'm wrong, and there's some reason why none of the things
above can actually happen, but it looks really bad to me.

                         Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT PULL] please pull file-locking related changes for v3.20
  2015-02-16 19:24       ` Linus Torvalds
@ 2015-02-16 19:59         ` Jeff Layton
  2015-02-17  0:02         ` Jeff Layton
  2015-02-17 19:08         ` J. Bruce Fields
  2 siblings, 0 replies; 20+ messages in thread
From: Jeff Layton @ 2015-02-16 19:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kirill A. Shutemov, linux-fsdevel, Linux Kernel Mailing List,
	J. Bruce Fields, Christoph Hellwig, Dave Chinner, Sasha Levin

On Mon, 16 Feb 2015 11:24:03 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Mon, Feb 16, 2015 at 10:46 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > This code is so broken that my initial reaction is "We need to just
> > revert the crap".

Sure, no problem reverting the lock counters. They were an optional
thing anyway...

> 
> How the hell is flock_lock_file() supposed to work at all, btw?
> 
> Say we have an existing flock, and now do a new one that conflicts. I
> see what looks like three separate bugs.
> 
>  - We go through the first loop, find a lock of another type, and
> delete it in preparation for replacing it
> 
>  - we *drop* the lock context spinlock.
> 
>  - BUG #1? So now there is no lock at all, and somebody can come in
> and see that unlocked state. Is that really valid?
> 
>  - another thread comes in while the first thread dropped the lock
> context lock, and wants to add its own lock. It doesn't see the
> deleted or pending locks, so it just adds it
> 
>  - the first thread gets the context spinlock again, and adds the lock
> that replaced the original
> 
>  - BUG #2? So now there are *two* locks on the thing, and the next
> time you do an unlock (or when you close the file), it will only
> remove/replace the first one.
> 
> Both of those bugs are due to the whole "drop the lock in the middle",
> which is pretty much always a mistake.  BUG#2 could easily explain the
> warning Kirill reports, afaik.
> 

Ahh, well spotted.

That drop the lock in the middle thing has always looked a little fishy
to me, but I had convinced myself a while back that it was OK. I think
you're correct that it's not, however.

I'll spin up a patch to remove it and we can see if that helps the
problem that Kirill saw.

> BUG#3 seems to be independent, and is about somebody replacing an
> existing lock, but the new lock conflicts. Again, the first loop will
> remove the old lock, and then the second loop will see the conflict,
> and return an error (and we may then end up waiting for it for the
> FILE_LOCK_DEFERRED case). Now the original lock is gone. Is that
> really right? That sounds bogus. *Failing* to insert a flock causing
> the old flock to go away?
> 

No, I think you're correct here.

The main problem is a LOCK_SH -> LOCK_EX upgrade. If there are other
LOCK_SH locks on the file, then you'll both lose your lock and fail to
get the LOCK_EX lock.

I think that's fixable by ensuring that we don't actually remove the
lock until we're sure that we can replace it. I'll see about fixing
that up as well. Give me a couple of days to get those cleaned up and
tested and I'll post some patches that clean up the holes here.

At that point we can see if that fixes Kirill's problem.

> Now, flock semantics are pretty much insane, so maybe all these bugs
> except for #2 aren't actually bugs, and are "features" of flock. But
> bug #2 can't be a semantic feature.
> 
> Is there something I'm missing here?
> 

No, I think they're all bugs. flock semantics aren't great, but I'm
pretty sure you shouldn't lose your lock just because you tried to
upgrade one and failed to do so.

> This was all just looking at a *single* function. Quite frankly, I
> hate how the code also just does
> 
>         if (filp->f_op->flock)
>                 filp->f_op->flock(filp, F_SETLKW, &fl);
>         else
>                 flock_lock_file(filp, &fl);
> 
> and blithely assumes that some random filesystem will get the flock
> semantics right, when even the code code screwed it up this badly.
> 

Sigh yeah. Most of the filesystems that define ->flock (or ->lock for
that matter) are remote or distributed filesystems, where we leave it
up to the server or lock manager (e.g. DLM) to get it right. I think we
don't have much choice but to assume that they do...

> And maybe I'm wrong, and there's some reason why none of the things
> above can actually happen, but it looks really bad to me.
> 
>                          Linus

Thanks for helping sanity check this stuff. 
-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT PULL] please pull file-locking related changes for v3.20
  2015-02-16 19:24       ` Linus Torvalds
  2015-02-16 19:59         ` Jeff Layton
@ 2015-02-17  0:02         ` Jeff Layton
  2015-02-17  0:21           ` Linus Torvalds
  2015-02-17 19:08         ` J. Bruce Fields
  2 siblings, 1 reply; 20+ messages in thread
From: Jeff Layton @ 2015-02-17  0:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kirill A. Shutemov, linux-fsdevel, Linux Kernel Mailing List,
	J. Bruce Fields, Christoph Hellwig, Dave Chinner, Sasha Levin

On Mon, 16 Feb 2015 11:24:03 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Mon, Feb 16, 2015 at 10:46 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > This code is so broken that my initial reaction is "We need to just
> > revert the crap".
> 
> How the hell is flock_lock_file() supposed to work at all, btw?
> 
> Say we have an existing flock, and now do a new one that conflicts. I
> see what looks like three separate bugs.
> 
>  - We go through the first loop, find a lock of another type, and
> delete it in preparation for replacing it
> 
>  - we *drop* the lock context spinlock.
> 
>  - BUG #1? So now there is no lock at all, and somebody can come in
> and see that unlocked state. Is that really valid?
> 
>  - another thread comes in while the first thread dropped the lock
> context lock, and wants to add its own lock. It doesn't see the
> deleted or pending locks, so it just adds it
> 
>  - the first thread gets the context spinlock again, and adds the lock
> that replaced the original
> 
>  - BUG #2? So now there are *two* locks on the thing, and the next
> time you do an unlock (or when you close the file), it will only
> remove/replace the first one.
> 
> Both of those bugs are due to the whole "drop the lock in the middle",
> which is pretty much always a mistake.  BUG#2 could easily explain the
> warning Kirill reports, afaik.
> 
> BUG#3 seems to be independent, and is about somebody replacing an
> existing lock, but the new lock conflicts. Again, the first loop will
> remove the old lock, and then the second loop will see the conflict,
> and return an error (and we may then end up waiting for it for the
> FILE_LOCK_DEFERRED case). Now the original lock is gone. Is that
> really right? That sounds bogus. *Failing* to insert a flock causing
> the old flock to go away?
> 
> Now, flock semantics are pretty much insane, so maybe all these bugs
> except for #2 aren't actually bugs, and are "features" of flock. But
> bug #2 can't be a semantic feature.
> 
> Is there something I'm missing here?
> 
> This was all just looking at a *single* function. Quite frankly, I
> hate how the code also just does
> 
>         if (filp->f_op->flock)
>                 filp->f_op->flock(filp, F_SETLKW, &fl);
>         else
>                 flock_lock_file(filp, &fl);
> 
> and blithely assumes that some random filesystem will get the flock
> semantics right, when even the code code screwed it up this badly.
> 
> And maybe I'm wrong, and there's some reason why none of the things
> above can actually happen, but it looks really bad to me.
> 
>                          Linus

Now that I look, it may be best to just revert this whole set for now.
Linus, are you amenable to doing that?

While I think this could be a nice cleanup, it obviously needs more
testing and scrutiny, and it won't hurt to wait another release or two
to make sure I get this right. There may be some merge conflicts with
Bruce's tree, but I'd rather deal with those than break the file
locking code.

Once we do that, then we could introduce some smaller patches to fix
up the bug you spotted, but at least we'll be proceeding from a spot
that is known to work.

I'll start preparing a pull request that does that...

Thanks,
-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT PULL] please pull file-locking related changes for v3.20
  2015-02-17  0:02         ` Jeff Layton
@ 2015-02-17  0:21           ` Linus Torvalds
  2015-02-17  0:35             ` Jeff Layton
  0 siblings, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2015-02-17  0:21 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Kirill A. Shutemov, linux-fsdevel, Linux Kernel Mailing List,
	J. Bruce Fields, Christoph Hellwig, Dave Chinner, Sasha Levin

[-- Attachment #1: Type: text/plain, Size: 1124 bytes --]

On Mon, Feb 16, 2015 at 4:02 PM, Jeff Layton <jlayton@poochiereds.net> wrote:
>
> Now that I look, it may be best to just revert this whole set for now.
> Linus, are you amenable to doing that?

Sure. But I'd prefer seeing how hard it would be to fix things first.
If this was at the end of the release cycle, I'd revert it
immediately. As it is, try to see how had it is.

The bugs I found might be as easy as just the attached (TOTALLY
UNTESTED) patch. The comment about a higher-priority process and
sched_resced() is just complete and utter crap. If somebody holds a
read lock and upgrades it to a write lock, there is absolutely *zero*
reason to let some higher-priority process get the write-lock first -
that's just simply semantically wrong bullshit. "Higher priority" does
not mean "can punch through locks".

Removing the silly incorrect counts should be trivial too. There
really are not very many users, and they can just walk the list
instead.

Now, if you've found other more fundamental bugs that look unfixable,
then that might mean that reverting it all is unavoidable, but..

                        Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 1813 bytes --]

 fs/locks.c | 22 +++++++---------------
 1 file changed, 7 insertions(+), 15 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 4753218f308e..8fbf81429608 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -867,12 +867,11 @@ static int posix_locks_deadlock(struct file_lock *caller_fl,
  */
 static int flock_lock_file(struct file *filp, struct file_lock *request)
 {
-	struct file_lock *new_fl = NULL;
+	struct file_lock *new_fl = NULL, *old_fl = NULL;
 	struct file_lock *fl;
 	struct file_lock_context *ctx;
 	struct inode *inode = file_inode(filp);
 	int error = 0;
-	bool found = false;
 	LIST_HEAD(dispose);
 
 	ctx = locks_get_lock_context(inode);
@@ -894,27 +893,18 @@ static int flock_lock_file(struct file *filp, struct file_lock *request)
 			continue;
 		if (request->fl_type == fl->fl_type)
 			goto out;
-		found = true;
-		locks_delete_lock_ctx(fl, &ctx->flc_flock_cnt, &dispose);
+		old_fl = NULL;
 		break;
 	}
 
 	if (request->fl_type == F_UNLCK) {
-		if ((request->fl_flags & FL_EXISTS) && !found)
+		if (old_fl)
+			locks_delete_lock_ctx(old_fl, &ctx->flc_flock_cnt, &dispose);
+		else if (request->fl_flags & FL_EXISTS)
 			error = -ENOENT;
 		goto out;
 	}
 
-	/*
-	 * If a higher-priority process was blocked on the old file lock,
-	 * give it the opportunity to lock the file.
-	 */
-	if (found) {
-		spin_unlock(&ctx->flc_lock);
-		cond_resched();
-		spin_lock(&ctx->flc_lock);
-	}
-
 find_conflict:
 	list_for_each_entry(fl, &ctx->flc_flock, fl_list) {
 		if (!flock_locks_conflict(request, fl))
@@ -928,6 +918,8 @@ find_conflict:
 	}
 	if (request->fl_flags & FL_ACCESS)
 		goto out;
+	if (old_fl)
+		locks_delete_lock_ctx(old_fl, &ctx->flc_flock_cnt, &dispose);
 	locks_copy_lock(new_fl, request);
 	locks_insert_lock_ctx(new_fl, &ctx->flc_flock_cnt, &ctx->flc_flock);
 	new_fl = NULL;

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [GIT PULL] please pull file-locking related changes for v3.20
  2015-02-17  0:21           ` Linus Torvalds
@ 2015-02-17  0:35             ` Jeff Layton
  0 siblings, 0 replies; 20+ messages in thread
From: Jeff Layton @ 2015-02-17  0:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kirill A. Shutemov, linux-fsdevel, Linux Kernel Mailing List,
	J. Bruce Fields, Christoph Hellwig, Dave Chinner, Sasha Levin

On Mon, 16 Feb 2015 16:21:30 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Mon, Feb 16, 2015 at 4:02 PM, Jeff Layton <jlayton@poochiereds.net> wrote:
> >
> > Now that I look, it may be best to just revert this whole set for now.
> > Linus, are you amenable to doing that?
> 
> Sure. But I'd prefer seeing how hard it would be to fix things first.
> If this was at the end of the release cycle, I'd revert it
> immediately. As it is, try to see how had it is.
> 

Fair enough. I just didn't want to hold up -rc1. I should be able to
fix up the bugs within the next day or so.

I've got a small stack of fixes that I'll send along soon.

> The bugs I found might be as easy as just the attached (TOTALLY
> UNTESTED) patch. The comment about a higher-priority process and
> sched_resced() is just complete and utter crap. If somebody holds a
> read lock and upgrades it to a write lock, there is absolutely *zero*
> reason to let some higher-priority process get the write-lock first -
> that's just simply semantically wrong bullshit. "Higher priority" does
> not mean "can punch through locks".
> 

The patch is pretty close to one that I have, so yes I think that will fix
it. There is one bug in the first loop though -- "old_fl" should be set
to "fl" there.

I'm also happy to remove the "drop the spinlock" thing. It's bothered
me for a while...

> Removing the silly incorrect counts should be trivial too. There
> really are not very many users, and they can just walk the list
> instead.
> 

Yes, that's a straightforward revert.

> Now, if you've found other more fundamental bugs that look unfixable,
> then that might mean that reverting it all is unavoidable, but..
> 
>                         Linus

No, I don't think there's anything unfixable there. I did find another
bug, but it's simple to fix.

-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT PULL] please pull file-locking related changes for v3.20
  2015-02-16 19:24       ` Linus Torvalds
  2015-02-16 19:59         ` Jeff Layton
  2015-02-17  0:02         ` Jeff Layton
@ 2015-02-17 19:08         ` J. Bruce Fields
  2015-02-17 19:13           ` Linus Torvalds
  2 siblings, 1 reply; 20+ messages in thread
From: J. Bruce Fields @ 2015-02-17 19:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeff Layton, Kirill A. Shutemov, linux-fsdevel,
	Linux Kernel Mailing List, Christoph Hellwig, Dave Chinner,
	Sasha Levin

On Mon, Feb 16, 2015 at 11:24:03AM -0800, Linus Torvalds wrote:
> On Mon, Feb 16, 2015 at 10:46 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > This code is so broken that my initial reaction is "We need to just
> > revert the crap".
> 
> How the hell is flock_lock_file() supposed to work at all, btw?
> 
> Say we have an existing flock, and now do a new one that conflicts. I
> see what looks like three separate bugs.
> 
>  - We go through the first loop, find a lock of another type, and
> delete it in preparation for replacing it
> 
>  - we *drop* the lock context spinlock.
> 
>  - BUG #1? So now there is no lock at all, and somebody can come in
> and see that unlocked state. Is that really valid?
> 
>  - another thread comes in while the first thread dropped the lock
> context lock, and wants to add its own lock. It doesn't see the
> deleted or pending locks, so it just adds it
> 
>  - the first thread gets the context spinlock again, and adds the lock
> that replaced the original
> 
>  - BUG #2? So now there are *two* locks on the thing, and the next
> time you do an unlock (or when you close the file), it will only
> remove/replace the first one.
> 
> Both of those bugs are due to the whole "drop the lock in the middle",
> which is pretty much always a mistake.  BUG#2 could easily explain the
> warning Kirill reports, afaik.
> 
> BUG#3 seems to be independent, and is about somebody replacing an
> existing lock, but the new lock conflicts. Again, the first loop will
> remove the old lock, and then the second loop will see the conflict,
> and return an error (and we may then end up waiting for it for the
> FILE_LOCK_DEFERRED case). Now the original lock is gone. Is that
> really right? That sounds bogus. *Failing* to insert a flock causing
> the old flock to go away?

>From flock(2):

	Converting a lock (shared to exclusive, or vice versa) is  not
	guaranteed  to  be atomic: the existing lock is first removed,
	and then a new lock is established.  Between these two steps, a
	pending  lock  request by  another process may be granted, with
	the result that the conversion either blocks, or fails if
	LOCK_NB was specified.

I also checked Michael Kerrisk's book quickly and see similar language
plus "... the conversion will fail and the process will lose its
original lock".

I don't have a quick way to check BSD, but it looks to me like this is
the way Linux has always behaved.

I agree that it's weird, but I think it's what we're stuck with.

--b.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT PULL] please pull file-locking related changes for v3.20
  2015-02-17 19:08         ` J. Bruce Fields
@ 2015-02-17 19:13           ` Linus Torvalds
  2015-02-17 19:27             ` Jeff Layton
                               ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Linus Torvalds @ 2015-02-17 19:13 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, Kirill A. Shutemov, linux-fsdevel,
	Linux Kernel Mailing List, Christoph Hellwig, Dave Chinner,
	Sasha Levin

On Tue, Feb 17, 2015 at 11:08 AM, J. Bruce Fields <bfields@fieldses.org> wrote:
>
> I agree that it's weird, but I think it's what we're stuck with.

And if by "weird" you mean "flock is really not a well-defined or sane
interface", I'll agree with you.

That said, I'm not at all sure about the "we're stuck with it". We can
improve the semantics without anybody noticing, because it's not like
anybody could *depend* on the weaker semantics - they needed
particular races and timings to hit anyway.

                          Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT PULL] please pull file-locking related changes for v3.20
  2015-02-17 19:13           ` Linus Torvalds
@ 2015-02-17 19:27             ` Jeff Layton
  2015-02-17 19:41               ` Linus Torvalds
  2015-02-17 19:29             ` Linus Torvalds
  2015-02-26 11:00             ` One Thousand Gnomes
  2 siblings, 1 reply; 20+ messages in thread
From: Jeff Layton @ 2015-02-17 19:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: J. Bruce Fields, Kirill A. Shutemov, linux-fsdevel,
	Linux Kernel Mailing List, Christoph Hellwig, Dave Chinner,
	Sasha Levin

On Tue, 17 Feb 2015 11:13:39 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, Feb 17, 2015 at 11:08 AM, J. Bruce Fields <bfields@fieldses.org> wrote:
> >
> > I agree that it's weird, but I think it's what we're stuck with.
> 
> And if by "weird" you mean "flock is really not a well-defined or sane
> interface", I'll agree with you.
> 
> That said, I'm not at all sure about the "we're stuck with it". We can
> improve the semantics without anybody noticing, because it's not like
> anybody could *depend* on the weaker semantics - they needed
> particular races and timings to hit anyway.
> 
>                           Linus

I'm not sure we want to make that change here and now though. That's
something that really ought to be approached a bit more carefully since
we might break some userland apps that depend on this (admittedly
strange) behavior.

What about this instead then?

- leave the "drop the spinlock" thing in place in flock_lock_file for
  v3.20

- change locks_remove_flock to just walk the list and delete any locks
  associated with the filp being closed

That should pretty closely mirror the behavior of v3.19.

Yes, that leaves the bug in place where you can end up with two locks
associated with the same filp, but that's the way it has worked now for
years. I'm leery of changing that behavior in the context of this set.

-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT PULL] please pull file-locking related changes for v3.20
  2015-02-17 19:13           ` Linus Torvalds
  2015-02-17 19:27             ` Jeff Layton
@ 2015-02-17 19:29             ` Linus Torvalds
  2015-02-26 11:00             ` One Thousand Gnomes
  2 siblings, 0 replies; 20+ messages in thread
From: Linus Torvalds @ 2015-02-17 19:29 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, Kirill A. Shutemov, linux-fsdevel,
	Linux Kernel Mailing List, Christoph Hellwig, Dave Chinner,
	Sasha Levin

On Tue, Feb 17, 2015 at 11:13 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> That said, I'm not at all sure about the "we're stuck with it". We can
> improve the semantics without anybody noticing, because it's not like
> anybody could *depend* on the weaker semantics - they needed
> particular races and timings to hit anyway.

.. that said, it's true that we cannot do the FILE_LOCK_DEFERRED thing
for multiple blockers that already hold a lock. One of them has to
release its lock for other lockers to make progress. So I guess the
weak model of dropping a read lock before taking a write lock is
actually required.

That sleep in the middle with dropping the lock is still complete
crap, though. You can't do it. If you drop the lock, you have to
repeat the while cycle, not just sleep and continue.

Or just go to sleep, waiting for the conflicting lock to also be dropped.

So maybe just removing that whole "if (found) { reschedule }" - but
leaving the "drop early - is the right thing to do.

                          Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT PULL] please pull file-locking related changes for v3.20
  2015-02-17 19:27             ` Jeff Layton
@ 2015-02-17 19:41               ` Linus Torvalds
  2015-02-17 19:45                 ` J. Bruce Fields
  2015-02-17 20:12                 ` Jeff Layton
  0 siblings, 2 replies; 20+ messages in thread
From: Linus Torvalds @ 2015-02-17 19:41 UTC (permalink / raw)
  To: Jeff Layton
  Cc: J. Bruce Fields, Kirill A. Shutemov, linux-fsdevel,
	Linux Kernel Mailing List, Christoph Hellwig, Dave Chinner,
	Sasha Levin

On Tue, Feb 17, 2015 at 11:27 AM, Jeff Layton <jlayton@poochiereds.net> wrote:
>
> What about this instead then?

No. Really.

> - leave the "drop the spinlock" thing in place in flock_lock_file for
>   v3.20

No. The whole concept of "drop the lock in the middle" is *BROKEN*.
It's seriously crap. It's not just a bug, it's a really fundamentally
wrong thing to do.

> - change locks_remove_flock to just walk the list and delete any locks
>   associated with the filp being closed

No. That's still wrong. You can have two people holding a write-lock.
Seriously. That's *shit*.

The "drop the spinlock in the middle" must go. There's not even any
reason for it. Just get rid of it. There can be no deadlock if you get
rid of it, because

 - we hold the flc_lock over the whole event, so we can never see any
half-way state

 - if we actually decide to sleep (due to conflicting locks) and
return FILE_LOCK_DEFERRED, we will drop the lock before actually
sleeping, so nobody else will be deadlocking on this file lock. So any
*other* person who tries to do an upgrade will not sleep, because the
pending upgrade will have moved to the blocking list (that whole
"locks_insert_block" part.

Ergo, either we'll upgrade the lock (atomically, within flc_lock), or
we will drop the lock (possibly moving it to the blocking list). I
don't see a deadlock.

I think your (and mine - but mine had the more fundamental problem of
never setting "old_fl" correctly at all) patch had a deadlock because
you didn't actually remove the old lock when you returned
FILE_LOCK_DEFERRED.

But I think the correct minimal patch is actually to just remove the
"if (found)" statement.

                       Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT PULL] please pull file-locking related changes for v3.20
  2015-02-17 19:41               ` Linus Torvalds
@ 2015-02-17 19:45                 ` J. Bruce Fields
  2015-02-17 20:12                 ` Jeff Layton
  1 sibling, 0 replies; 20+ messages in thread
From: J. Bruce Fields @ 2015-02-17 19:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeff Layton, Kirill A. Shutemov, linux-fsdevel,
	Linux Kernel Mailing List, Christoph Hellwig, Dave Chinner,
	Sasha Levin

On Tue, Feb 17, 2015 at 11:41:40AM -0800, Linus Torvalds wrote:
> On Tue, Feb 17, 2015 at 11:27 AM, Jeff Layton <jlayton@poochiereds.net> wrote:
> >
> > What about this instead then?
> 
> No. Really.
> 
> > - leave the "drop the spinlock" thing in place in flock_lock_file for
> >   v3.20
> 
> No. The whole concept of "drop the lock in the middle" is *BROKEN*.
> It's seriously crap. It's not just a bug, it's a really fundamentally
> wrong thing to do.
> 
> > - change locks_remove_flock to just walk the list and delete any locks
> >   associated with the filp being closed
> 
> No. That's still wrong. You can have two people holding a write-lock.
> Seriously. That's *shit*.
> 
> The "drop the spinlock in the middle" must go. There's not even any
> reason for it. Just get rid of it. There can be no deadlock if you get
> rid of it, because
> 
>  - we hold the flc_lock over the whole event, so we can never see any
> half-way state
> 
>  - if we actually decide to sleep (due to conflicting locks) and
> return FILE_LOCK_DEFERRED, we will drop the lock before actually
> sleeping, so nobody else will be deadlocking on this file lock. So any
> *other* person who tries to do an upgrade will not sleep, because the
> pending upgrade will have moved to the blocking list (that whole
> "locks_insert_block" part.

Whoops, you're right, I was forgetting that wait happens up in
flock_lock_file_wait(), OK.

--b.

> Ergo, either we'll upgrade the lock (atomically, within flc_lock), or
> we will drop the lock (possibly moving it to the blocking list). I
> don't see a deadlock.
> 
> I think your (and mine - but mine had the more fundamental problem of
> never setting "old_fl" correctly at all) patch had a deadlock because
> you didn't actually remove the old lock when you returned
> FILE_LOCK_DEFERRED.
> 
> But I think the correct minimal patch is actually to just remove the
> "if (found)" statement.
> 
>                        Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT PULL] please pull file-locking related changes for v3.20
  2015-02-17 19:41               ` Linus Torvalds
  2015-02-17 19:45                 ` J. Bruce Fields
@ 2015-02-17 20:12                 ` Jeff Layton
  2015-02-17 20:17                   ` Linus Torvalds
  1 sibling, 1 reply; 20+ messages in thread
From: Jeff Layton @ 2015-02-17 20:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: J. Bruce Fields, Kirill A. Shutemov, linux-fsdevel,
	Linux Kernel Mailing List, Christoph Hellwig, Dave Chinner,
	Sasha Levin

[-- Attachment #1: Type: text/plain, Size: 3072 bytes --]

On Tue, 17 Feb 2015 11:41:40 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, Feb 17, 2015 at 11:27 AM, Jeff Layton <jlayton@poochiereds.net> wrote:
> >
> > What about this instead then?
> 
> No. Really.
> 
> > - leave the "drop the spinlock" thing in place in flock_lock_file for
> >   v3.20
> 
> No. The whole concept of "drop the lock in the middle" is *BROKEN*.
> It's seriously crap. It's not just a bug, it's a really fundamentally
> wrong thing to do.
> 
> > - change locks_remove_flock to just walk the list and delete any locks
> >   associated with the filp being closed
> 
> No. That's still wrong. You can have two people holding a write-lock.
> Seriously. That's *shit*.
> 
> The "drop the spinlock in the middle" must go. There's not even any
> reason for it. Just get rid of it. There can be no deadlock if you get
> rid of it, because
> 
>  - we hold the flc_lock over the whole event, so we can never see any
> half-way state
> 
>  - if we actually decide to sleep (due to conflicting locks) and
> return FILE_LOCK_DEFERRED, we will drop the lock before actually
> sleeping, so nobody else will be deadlocking on this file lock. So any
> *other* person who tries to do an upgrade will not sleep, because the
> pending upgrade will have moved to the blocking list (that whole
> "locks_insert_block" part.
> 
> Ergo, either we'll upgrade the lock (atomically, within flc_lock), or
> we will drop the lock (possibly moving it to the blocking list). I
> don't see a deadlock.
> 
> I think your (and mine - but mine had the more fundamental problem of
> never setting "old_fl" correctly at all) patch had a deadlock because
> you didn't actually remove the old lock when you returned
> FILE_LOCK_DEFERRED.
> 
> But I think the correct minimal patch is actually to just remove the
> "if (found)" statement.
> 
>                        Linus

I agree that there's no deadlock. I also agree that allowing two
LOCK_EX's (or a LOCK_SH + LOCK_EX) on the file is broken. I'm just
leery on making a user-visible change at this point. I'd prefer to let
something like that soak in linux-next for a while.

Another possibility is to keep dropping the spinlock, but check to see
if someone set a new lock on the same filp in the loop after that. If
they have, then we could just remove that lock before adding the new
one.

I don't think that would violate anything since there are no atomicity
guarantees here. If you're setting locks on the same filp from multiple
tasks then you're simply asking for trouble.

I don't expect that most apps do that though, but rather work on their
own set of open file descriptions. Those might get bitten however if we
stop dropping the spinlock there since we'll be changing how flock's
fairness works.

See the attached (untested) patch for what I'm thinking. If you still
think that removing the "if (found)" clause is the right thing to do,
I'll go with that, but I do worry that we might break some (fragile)
app that might rely on the way that flock works today.

-- 
Jeff Layton <jlayton@poochiereds.net>

[-- Attachment #2: 0001-locks-ensure-that-we-can-t-set-multiple-flock-locks-.patch --]
[-- Type: text/x-patch, Size: 1662 bytes --]

>From 3212be05d47300fbb5718932f92b33acde3d219c Mon Sep 17 00:00:00 2001
From: Jeff Layton <jeff.layton@primarydata.com>
Date: Tue, 17 Feb 2015 15:08:06 -0500
Subject: [PATCH] locks: ensure that we can't set multiple flock locks for the
 same filp

Currently, we'll drop the spinlock in the middle of flock_lock_file in
the event that we found an lock that needed to be removed prior to an
upgrade or downgrade.

It's possible however for another task to race in and set a lock on
the same filp. If that happens, then we don't want to set an additional
lock, so just remove the one that raced in and set our own.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/locks.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index fe8f9f46445b..099b60a46ccc 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -864,7 +864,7 @@ static int posix_locks_deadlock(struct file_lock *caller_fl,
 static int flock_lock_file(struct file *filp, struct file_lock *request)
 {
 	struct file_lock *new_fl = NULL;
-	struct file_lock *fl;
+	struct file_lock *fl, *tmp;
 	struct file_lock_context *ctx;
 	struct inode *inode = file_inode(filp);
 	int error = 0;
@@ -912,7 +912,12 @@ static int flock_lock_file(struct file *filp, struct file_lock *request)
 	}
 
 find_conflict:
-	list_for_each_entry(fl, &ctx->flc_flock, fl_list) {
+	list_for_each_entry_safe(fl, tmp, &ctx->flc_flock, fl_list) {
+		/* did someone set a lock on the same filp? */
+		if (fl->fl_file == filp) {
+			locks_delete_lock_ctx(fl, &dispose);
+			continue;
+		}
 		if (!flock_locks_conflict(request, fl))
 			continue;
 		error = -EAGAIN;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [GIT PULL] please pull file-locking related changes for v3.20
  2015-02-17 20:12                 ` Jeff Layton
@ 2015-02-17 20:17                   ` Linus Torvalds
  0 siblings, 0 replies; 20+ messages in thread
From: Linus Torvalds @ 2015-02-17 20:17 UTC (permalink / raw)
  To: Jeff Layton
  Cc: J. Bruce Fields, Kirill A. Shutemov, linux-fsdevel,
	Linux Kernel Mailing List, Christoph Hellwig, Dave Chinner,
	Sasha Levin

On Tue, Feb 17, 2015 at 12:12 PM, Jeff Layton <jlayton@poochiereds.net> wrote:
>
> I agree that there's no deadlock. I also agree that allowing two
> LOCK_EX's (or a LOCK_SH + LOCK_EX) on the file is broken. I'm just
> leery on making a user-visible change at this point. I'd prefer to let
> something like that soak in linux-next for a while.

Umm. The *current* behavior is user-visible. As in "catastrophic bug"
user visible. Not just Kirill's report, but the silent "possibly
exclusive file locks aren't exclusive".

No, we're not delaying fixing this because of concerns of other
user-visible behavior. There is absolutely no way other behavior could
possibly be *less* catastrophic than data corruption (in user space)
due to non-working exclusive locks.

> Another possibility is to keep dropping the spinlock, but check to see
> if someone set a new lock on the same filp in the loop after that.

What would that buy?

I agree that replacing the "re-get the spin-lock" with a "goto repeat"
to the top (which will re-get the spinlock and then look for new
existing locks) would also fix the problem with multiple locks, but
it's just more code to do complex stuff that doesn't actually fix
anything. Just removing the known buggy code is actually the simpler
and safer fix.

                            Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT PULL] please pull file-locking related changes for v3.20
  2015-02-17 19:13           ` Linus Torvalds
  2015-02-17 19:27             ` Jeff Layton
  2015-02-17 19:29             ` Linus Torvalds
@ 2015-02-26 11:00             ` One Thousand Gnomes
  2015-02-26 14:45               ` J. Bruce Fields
  2 siblings, 1 reply; 20+ messages in thread
From: One Thousand Gnomes @ 2015-02-26 11:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: J. Bruce Fields, Jeff Layton, Kirill A. Shutemov, linux-fsdevel,
	Linux Kernel Mailing List, Christoph Hellwig, Dave Chinner,
	Sasha Levin

On Tue, 17 Feb 2015 11:13:39 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, Feb 17, 2015 at 11:08 AM, J. Bruce Fields <bfields@fieldses.org> wrote:
> >
> > I agree that it's weird, but I think it's what we're stuck with.
> 
> And if by "weird" you mean "flock is really not a well-defined or sane
> interface", I'll agree with you.
> 
> That said, I'm not at all sure about the "we're stuck with it". We can
> improve the semantics without anybody noticing, because it's not like
> anybody could *depend* on the weaker semantics - they needed
> particular races and timings to hit anyway.

The BSD implementation does not documented such a race, or indeed appear
to have one. That implies that nothing using flock should have this
problem.

Alan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT PULL] please pull file-locking related changes for v3.20
  2015-02-26 11:00             ` One Thousand Gnomes
@ 2015-02-26 14:45               ` J. Bruce Fields
  2015-02-26 15:09                 ` J. Bruce Fields
  0 siblings, 1 reply; 20+ messages in thread
From: J. Bruce Fields @ 2015-02-26 14:45 UTC (permalink / raw)
  To: One Thousand Gnomes
  Cc: Linus Torvalds, Jeff Layton, Kirill A. Shutemov, linux-fsdevel,
	Linux Kernel Mailing List, Christoph Hellwig, Dave Chinner,
	Sasha Levin

On Thu, Feb 26, 2015 at 11:00:46AM +0000, One Thousand Gnomes wrote:
> On Tue, 17 Feb 2015 11:13:39 -0800
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > On Tue, Feb 17, 2015 at 11:08 AM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > >
> > > I agree that it's weird, but I think it's what we're stuck with.
> > 
> > And if by "weird" you mean "flock is really not a well-defined or sane
> > interface", I'll agree with you.
> > 
> > That said, I'm not at all sure about the "we're stuck with it". We can
> > improve the semantics without anybody noticing, because it's not like
> > anybody could *depend* on the weaker semantics - they needed
> > particular races and timings to hit anyway.
> 
> The BSD implementation does not documented such a race, or indeed appear
> to have one. That implies that nothing using flock should have this
> problem.

Which race are you talking about exactly, and what evidence are you
working from?

--b.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [GIT PULL] please pull file-locking related changes for v3.20
  2015-02-26 14:45               ` J. Bruce Fields
@ 2015-02-26 15:09                 ` J. Bruce Fields
  0 siblings, 0 replies; 20+ messages in thread
From: J. Bruce Fields @ 2015-02-26 15:09 UTC (permalink / raw)
  To: One Thousand Gnomes
  Cc: Linus Torvalds, Jeff Layton, Kirill A. Shutemov, linux-fsdevel,
	Linux Kernel Mailing List, Christoph Hellwig, Dave Chinner,
	Sasha Levin

On Thu, Feb 26, 2015 at 09:45:00AM -0500, J. Bruce Fields wrote:
> On Thu, Feb 26, 2015 at 11:00:46AM +0000, One Thousand Gnomes wrote:
> > On Tue, 17 Feb 2015 11:13:39 -0800
> > Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > 
> > > On Tue, Feb 17, 2015 at 11:08 AM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > > >
> > > > I agree that it's weird, but I think it's what we're stuck with.
> > > 
> > > And if by "weird" you mean "flock is really not a well-defined or sane
> > > interface", I'll agree with you.
> > > 
> > > That said, I'm not at all sure about the "we're stuck with it". We can
> > > improve the semantics without anybody noticing, because it's not like
> > > anybody could *depend* on the weaker semantics - they needed
> > > particular races and timings to hit anyway.
> > 
> > The BSD implementation does not documented such a race, or indeed appear
> > to have one. That implies that nothing using flock should have this
> > problem.
> 
> Which race are you talking about exactly, and what evidence are you
> working from?

To clarify: I previously conflated two issues:

	- the temporary drop of the spinlock in flock_lock_file().
	  Agreed that that's pointless, and has been fixed.

	- non-atomic flock upgrades: that's definitely documented
	  behavior on BSD.

--b.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2015-02-26 15:09 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-09 10:55 [GIT PULL] please pull file-locking related changes for v3.20 Jeff Layton
2015-02-16 13:32 ` Kirill A. Shutemov
2015-02-16 14:00   ` Jeff Layton
2015-02-16 18:46     ` Linus Torvalds
2015-02-16 19:24       ` Linus Torvalds
2015-02-16 19:59         ` Jeff Layton
2015-02-17  0:02         ` Jeff Layton
2015-02-17  0:21           ` Linus Torvalds
2015-02-17  0:35             ` Jeff Layton
2015-02-17 19:08         ` J. Bruce Fields
2015-02-17 19:13           ` Linus Torvalds
2015-02-17 19:27             ` Jeff Layton
2015-02-17 19:41               ` Linus Torvalds
2015-02-17 19:45                 ` J. Bruce Fields
2015-02-17 20:12                 ` Jeff Layton
2015-02-17 20:17                   ` Linus Torvalds
2015-02-17 19:29             ` Linus Torvalds
2015-02-26 11:00             ` One Thousand Gnomes
2015-02-26 14:45               ` J. Bruce Fields
2015-02-26 15:09                 ` J. Bruce Fields

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).