All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v2 0/4] fs/locks: Use plain percpu spinlocks instead of lglock to protect file_lock
@ 2015-03-02 14:25 Daniel Wagner
  2015-03-02 14:25 ` [RFC v2 1/4] locks: Remove unnecessary IS_POSIX test Daniel Wagner
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Daniel Wagner @ 2015-03-02 14:25 UTC (permalink / raw)
  To: Jeff Layton
  Cc: linux-fsdevel, linux-kernel, Daniel Wagner, Alexander Viro,
	Jeff Layton, J. Bruce Fields

Hi Jeff,

I've dropped the spinlock conversion for the time beeing. Maybe the
last patch which changes the usage of blocked_lock_lock is still
useful. And in case I can convince of the spinlock conversion it can
easliy done on top of it. I think it makes it also simpler to review
doing it this after all.

cheers,
daniel

v2:
 - added a few lockdep assertion
 - dropped spinlock conversion
 
v1:
 - rebased on v3.19-8975-g3d88348
 - splittet into smaller pieces
 - fixed a wrong usage of __locks_insert/delete_block() and it's posix version
 - added seqfile helpers to avoid ugly open coded version


Original cover letter:

I am looking at how to get rid of lglock. Reason being -rt is not too
happy with that lock, especially that it uses arch_spinlock_t and
therefore it is not changed into a mutex on -rt. I know no change is
accepted only fixing something for -rt alone. So here my attempt to
make things faster for mainline and fixing -rt.

There are two users of lglock at this point. fs/locks.c and
kernel/stop_machine.c

I presume the fs/locks is the more interesting one in respect of
performance. Let's have a look at that one first.

The lglock version of file_lock_lock is used in combination of
blocked_lock_lock to protect file_lock's fl_link, fl_block, fl_next,
blocked_hash and the percpu file_lock_list.

The plan is to reorganize the usage of the locks and what they protect
so that the usage of the global blocked_lock_lock is reduced.

Whenever we insert a new lock we are going to grab besides the i_lock
also the corresponding percpu file_lock_lock. The global
blocked_lock_lock is only used when blocked_hash is involved.

file_lock_list exists to be being able to produce the content of
/proc/locks. For listing the all locks it seems a bit excessive to
grab all locks at once. We should be okay just grabbing the
corresponding lock when iterating over the percpu file_lock_list.

file_lock_lock protects now file_lock_list and fl_link, fl_block and
fl_next allone. That means we need to define which file_lock_lock is
used for all waiters. Luckely, fl_link_cpu can be reused for fl_block
and fl_next.

I haven't found a good way around for the open coded seq_ops
(locks_start, locks_next, locks_stop). Maybe someone has good idea how
to handle with the locks.

For performance testing I used
git://git.samba.org/jlayton/lockperf.git and for correctness
https://github.com/linux-test-project/ltp/tree/master/testcases/network/nfsv4/locks
In case you are missing the posix03 results, my machine doesn't like
it too much. The load brings it to its knees due to the very high
load. Propably I need different parameters.

I didn't run excessive tests so far, because I am waiting for getting
access on a bigger box compared to my small i7-4850HQ system. I hope
to see larger improvements when there are more cores involved.

[...]

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jeff Layton <jlayton@poochiereds.net>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org

Daniel Wagner (4):
  locks: Remove unnecessary IS_POSIX test
  locks: Add lockdep assertion for blocked_lock_lock
  locks: Split insert/delete block functions into flock/posix parts
  locks: Use blocked_lock_lock only to protect blocked_hash

 fs/locks.c | 124 +++++++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 84 insertions(+), 40 deletions(-)

-- 
2.1.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC v2 1/4] locks: Remove unnecessary IS_POSIX test
  2015-03-02 14:25 [RFC v2 0/4] fs/locks: Use plain percpu spinlocks instead of lglock to protect file_lock Daniel Wagner
@ 2015-03-02 14:25 ` Daniel Wagner
  2015-03-03  0:55   ` Jeff Layton
  2015-03-02 14:25 ` [RFC v2 2/4] locks: Add lockdep assertion for blocked_lock_lock Daniel Wagner
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Daniel Wagner @ 2015-03-02 14:25 UTC (permalink / raw)
  To: Jeff Layton
  Cc: linux-fsdevel, linux-kernel, Daniel Wagner, J. Bruce Fields,
	Alexander Viro

Since following change

commit bd61e0a9c852de2d705b6f1bb2cc54c5774db570
Author: Jeff Layton <jlayton@primarydata.com>
Date:   Fri Jan 16 15:05:55 2015 -0500

    locks: convert posix locks to file_lock_context

all Posix locks are kept on their a separate list, so the test is
redudant.

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: Jeff Layton <jlayton@primarydata.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
---
 fs/locks.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 365c82e..f63aa92 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -964,8 +964,6 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 	 */
 	if (request->fl_type != F_UNLCK) {
 		list_for_each_entry(fl, &ctx->flc_posix, fl_list) {
-			if (!IS_POSIX(fl))
-				continue;
 			if (!posix_locks_conflict(request, fl))
 				continue;
 			if (conflock)
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC v2 2/4] locks: Add lockdep assertion for blocked_lock_lock
  2015-03-02 14:25 [RFC v2 0/4] fs/locks: Use plain percpu spinlocks instead of lglock to protect file_lock Daniel Wagner
  2015-03-02 14:25 ` [RFC v2 1/4] locks: Remove unnecessary IS_POSIX test Daniel Wagner
@ 2015-03-02 14:25 ` Daniel Wagner
  2015-03-03  0:55   ` Jeff Layton
  2015-03-02 14:25 ` [RFC v2 3/4] locks: Split insert/delete block functions into flock/posix parts Daniel Wagner
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Daniel Wagner @ 2015-03-02 14:25 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-fsdevel, linux-kernel, Daniel Wagner

Annonate insert, remove and iterate function that we need
blocked_lock_lock held.

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
---
 fs/locks.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/locks.c b/fs/locks.c
index f63aa92..4498da0 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -592,11 +592,15 @@ posix_owner_key(struct file_lock *fl)
 
 static void locks_insert_global_blocked(struct file_lock *waiter)
 {
+	lockdep_assert_held(&blocked_lock_lock);
+
 	hash_add(blocked_hash, &waiter->fl_link, posix_owner_key(waiter));
 }
 
 static void locks_delete_global_blocked(struct file_lock *waiter)
 {
+	lockdep_assert_held(&blocked_lock_lock);
+
 	hash_del(&waiter->fl_link);
 }
 
@@ -838,6 +842,8 @@ static int posix_locks_deadlock(struct file_lock *caller_fl,
 {
 	int i = 0;
 
+	lockdep_assert_held(&blocked_lock_lock);
+
 	/*
 	 * This deadlock detector can't reasonably detect deadlocks with
 	 * FL_OFDLCK locks, since they aren't owned by a process, per-se.
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC v2 3/4] locks: Split insert/delete block functions into flock/posix parts
  2015-03-02 14:25 [RFC v2 0/4] fs/locks: Use plain percpu spinlocks instead of lglock to protect file_lock Daniel Wagner
  2015-03-02 14:25 ` [RFC v2 1/4] locks: Remove unnecessary IS_POSIX test Daniel Wagner
  2015-03-02 14:25 ` [RFC v2 2/4] locks: Add lockdep assertion for blocked_lock_lock Daniel Wagner
@ 2015-03-02 14:25 ` Daniel Wagner
  2015-03-03  0:55   ` Jeff Layton
  2015-03-02 14:25 ` [RFC v2 4/4] locks: Use blocked_lock_lock only to protect blocked_hash Daniel Wagner
  2015-03-02 15:23 ` [RFC v2 0/4] fs/locks: Use plain percpu spinlocks instead of lglock to protect file_lock Jeff Layton
  4 siblings, 1 reply; 19+ messages in thread
From: Daniel Wagner @ 2015-03-02 14:25 UTC (permalink / raw)
  To: Jeff Layton
  Cc: linux-fsdevel, linux-kernel, Daniel Wagner, Jeff Layton,
	J. Bruce Fields, Alexander Viro

The locks_insert/delete_block() functions are used for flock, posix
and leases types. blocked_lock_lock is used to serialize all access to
fl_link, fl_block, fl_next and blocked_hash. Here, we prepare the
stage for using blocked_lock_lock to protect blocked_hash.

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: Jeff Layton <jlayton@poochiereds.net>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
---
 fs/locks.c | 48 ++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 40 insertions(+), 8 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 4498da0..02821dd 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -611,11 +611,20 @@ static void locks_delete_global_blocked(struct file_lock *waiter)
  */
 static void __locks_delete_block(struct file_lock *waiter)
 {
-	locks_delete_global_blocked(waiter);
 	list_del_init(&waiter->fl_block);
 	waiter->fl_next = NULL;
 }
 
+/* Posix block variant of __locks_delete_block.
+ *
+ * Must be called with blocked_lock_lock held.
+ */
+static void __locks_delete_posix_block(struct file_lock *waiter)
+{
+	locks_delete_global_blocked(waiter);
+	__locks_delete_block(waiter);
+}
+
 static void locks_delete_block(struct file_lock *waiter)
 {
 	spin_lock(&blocked_lock_lock);
@@ -623,6 +632,13 @@ static void locks_delete_block(struct file_lock *waiter)
 	spin_unlock(&blocked_lock_lock);
 }
 
+static void locks_delete_posix_block(struct file_lock *waiter)
+{
+	spin_lock(&blocked_lock_lock);
+	__locks_delete_posix_block(waiter);
+	spin_unlock(&blocked_lock_lock);
+}
+
 /* Insert waiter into blocker's block list.
  * We use a circular list so that processes can be easily woken up in
  * the order they blocked. The documentation doesn't require this but
@@ -639,7 +655,17 @@ static void __locks_insert_block(struct file_lock *blocker,
 	BUG_ON(!list_empty(&waiter->fl_block));
 	waiter->fl_next = blocker;
 	list_add_tail(&waiter->fl_block, &blocker->fl_block);
-	if (IS_POSIX(blocker) && !IS_OFDLCK(blocker))
+}
+
+/* Posix block variant of __locks_insert_block.
+ *
+ * Must be called with flc_lock and blocked_lock_lock held.
+ */
+static void __locks_insert_posix_block(struct file_lock *blocker,
+					struct file_lock *waiter)
+{
+	__locks_insert_block(blocker, waiter);
+	if (!IS_OFDLCK(blocker))
 		locks_insert_global_blocked(waiter);
 }
 
@@ -675,7 +701,10 @@ static void locks_wake_up_blocks(struct file_lock *blocker)
 
 		waiter = list_first_entry(&blocker->fl_block,
 				struct file_lock, fl_block);
-		__locks_delete_block(waiter);
+		if (IS_POSIX(blocker))
+			__locks_delete_posix_block(waiter);
+		else
+			__locks_delete_block(waiter);
 		if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
 			waiter->fl_lmops->lm_notify(waiter);
 		else
@@ -985,7 +1014,7 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 			spin_lock(&blocked_lock_lock);
 			if (likely(!posix_locks_deadlock(request, fl))) {
 				error = FILE_LOCK_DEFERRED;
-				__locks_insert_block(fl, request);
+				__locks_insert_posix_block(fl, request);
 			}
 			spin_unlock(&blocked_lock_lock);
 			goto out;
@@ -1186,7 +1215,7 @@ int posix_lock_file_wait(struct file *filp, struct file_lock *fl)
 		if (!error)
 			continue;
 
-		locks_delete_block(fl);
+		locks_delete_posix_block(fl);
 		break;
 	}
 	return error;
@@ -1283,7 +1312,7 @@ int locks_mandatory_area(int read_write, struct inode *inode,
 				continue;
 		}
 
-		locks_delete_block(&fl);
+		locks_delete_posix_block(&fl);
 		break;
 	}
 
@@ -2103,7 +2132,10 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd,
 		if (!error)
 			continue;
 
-		locks_delete_block(fl);
+		if (IS_POSIX(fl))
+			locks_delete_posix_block(fl);
+		else
+			locks_delete_block(fl);
 		break;
 	}
 
@@ -2467,7 +2499,7 @@ posix_unblock_lock(struct file_lock *waiter)
 
 	spin_lock(&blocked_lock_lock);
 	if (waiter->fl_next)
-		__locks_delete_block(waiter);
+		__locks_delete_posix_block(waiter);
 	else
 		status = -ENOENT;
 	spin_unlock(&blocked_lock_lock);
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC v2 4/4] locks: Use blocked_lock_lock only to protect blocked_hash
  2015-03-02 14:25 [RFC v2 0/4] fs/locks: Use plain percpu spinlocks instead of lglock to protect file_lock Daniel Wagner
                   ` (2 preceding siblings ...)
  2015-03-02 14:25 ` [RFC v2 3/4] locks: Split insert/delete block functions into flock/posix parts Daniel Wagner
@ 2015-03-02 14:25 ` Daniel Wagner
  2015-03-03  0:58   ` Jeff Layton
  2015-03-02 15:23 ` [RFC v2 0/4] fs/locks: Use plain percpu spinlocks instead of lglock to protect file_lock Jeff Layton
  4 siblings, 1 reply; 19+ messages in thread
From: Daniel Wagner @ 2015-03-02 14:25 UTC (permalink / raw)
  To: Jeff Layton
  Cc: linux-fsdevel, linux-kernel, Daniel Wagner, Jeff Layton,
	J. Bruce Fields, Alexander Viro

blocked_lock_lock and file_lock_lglock is used to protect file_lock's
fl_link, fl_block, fl_next, blocked_hash and the percpu
file_lock_list.

The plan is to reorganize the usage of the locks and what they protect
so that the usage of the global blocked_lock_lock is reduced.

Whenever we insert a new lock we are going to grab besides the
flc_lock also the corresponding file_lock_lglock. The global
blocked_lock_lock is only used when blocked_hash is involved.

file_lock_lglock protects now file_lock_list and fl_link, fl_block and
fl_next allone. That means we need to define which file_lock_lglock is
used for all waiters. Luckely, fl_link_cpu can be reused for fl_block
and fl_next.

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: Jeff Layton <jlayton@poochiereds.net>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
---
 fs/locks.c | 78 ++++++++++++++++++++++++++++++++++----------------------------
 1 file changed, 43 insertions(+), 35 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 02821dd..de15ea8 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -162,6 +162,20 @@ int lease_break_time = 45;
  * keep a list on each CPU, with each list protected by its own spinlock via
  * the file_lock_lglock. Note that alterations to the list also require that
  * the relevant flc_lock is held.
+ *
+ * In addition, it also protects the fl->fl_block list, and the fl->fl_next
+ * pointer for file_lock structures that are acting as lock requests (in
+ * contrast to those that are acting as records of acquired locks).
+ *
+ * file_lock structures acting as lock requests (waiters) use the same
+ * spinlock as the those acting as lock holder (blocker). E.g. the
+ * blocker is initially added to the file_lock_list living on CPU 0,
+ * all waiters on that blocker are serialized via CPU 0 (see
+ * fl_link_cpu usage).
+ *
+ * In particular, adding an entry to the fl_block list requires that you hold
+ * both the flc_lock and the blocked_lock_lock (acquired in that order).
+ * Deleting an entry from the list however only requires the file_lock_gllock.
  */
 DEFINE_STATIC_LGLOCK(file_lock_lglock);
 static DEFINE_PER_CPU(struct hlist_head, file_lock_list);
@@ -183,19 +197,6 @@ static DEFINE_HASHTABLE(blocked_hash, BLOCKED_HASH_BITS);
 /*
  * This lock protects the blocked_hash. Generally, if you're accessing it, you
  * want to be holding this lock.
- *
- * In addition, it also protects the fl->fl_block list, and the fl->fl_next
- * pointer for file_lock structures that are acting as lock requests (in
- * contrast to those that are acting as records of acquired locks).
- *
- * Note that when we acquire this lock in order to change the above fields,
- * we often hold the flc_lock as well. In certain cases, when reading the fields
- * protected by this lock, we can skip acquiring it iff we already hold the
- * flc_lock.
- *
- * In particular, adding an entry to the fl_block list requires that you hold
- * both the flc_lock and the blocked_lock_lock (acquired in that order).
- * Deleting an entry from the list however only requires the file_lock_lock.
  */
 static DEFINE_SPINLOCK(blocked_lock_lock);
 
@@ -607,7 +608,7 @@ static void locks_delete_global_blocked(struct file_lock *waiter)
 /* Remove waiter from blocker's block list.
  * When blocker ends up pointing to itself then the list is empty.
  *
- * Must be called with blocked_lock_lock held.
+ * Must be called with file_lock_lglock held.
  */
 static void __locks_delete_block(struct file_lock *waiter)
 {
@@ -617,7 +618,7 @@ static void __locks_delete_block(struct file_lock *waiter)
 
 /* Posix block variant of __locks_delete_block.
  *
- * Must be called with blocked_lock_lock held.
+ * Must be called with file_lock_lglock held.
  */
 static void __locks_delete_posix_block(struct file_lock *waiter)
 {
@@ -627,16 +628,18 @@ static void __locks_delete_posix_block(struct file_lock *waiter)
 
 static void locks_delete_block(struct file_lock *waiter)
 {
-	spin_lock(&blocked_lock_lock);
+	lg_local_lock_cpu(&file_lock_lglock, waiter->fl_link_cpu);
 	__locks_delete_block(waiter);
-	spin_unlock(&blocked_lock_lock);
+	lg_local_unlock_cpu(&file_lock_lglock, waiter->fl_link_cpu);
 }
 
 static void locks_delete_posix_block(struct file_lock *waiter)
 {
+	lg_local_lock_cpu(&file_lock_lglock, waiter->fl_link_cpu);
 	spin_lock(&blocked_lock_lock);
 	__locks_delete_posix_block(waiter);
 	spin_unlock(&blocked_lock_lock);
+	lg_local_unlock_cpu(&file_lock_lglock, waiter->fl_link_cpu);
 }
 
 /* Insert waiter into blocker's block list.
@@ -644,22 +647,23 @@ static void locks_delete_posix_block(struct file_lock *waiter)
  * the order they blocked. The documentation doesn't require this but
  * it seems like the reasonable thing to do.
  *
- * Must be called with both the flc_lock and blocked_lock_lock held. The
- * fl_block list itself is protected by the blocked_lock_lock, but by ensuring
+ * Must be called with both the flc_lock and file_lock_lglock held. The
+ * fl_block list itself is protected by the file_lock_lglock, but by ensuring
  * that the flc_lock is also held on insertions we can avoid taking the
- * blocked_lock_lock in some cases when we see that the fl_block list is empty.
+ * file_lock_lglock in some cases when we see that the fl_block list is empty.
  */
 static void __locks_insert_block(struct file_lock *blocker,
 					struct file_lock *waiter)
 {
 	BUG_ON(!list_empty(&waiter->fl_block));
+	waiter->fl_link_cpu = blocker->fl_link_cpu;
 	waiter->fl_next = blocker;
 	list_add_tail(&waiter->fl_block, &blocker->fl_block);
 }
 
 /* Posix block variant of __locks_insert_block.
  *
- * Must be called with flc_lock and blocked_lock_lock held.
+ * Must be called with flc_lock and file_lock_lglock held.
  */
 static void __locks_insert_posix_block(struct file_lock *blocker,
 					struct file_lock *waiter)
@@ -673,9 +677,9 @@ static void __locks_insert_posix_block(struct file_lock *blocker,
 static void locks_insert_block(struct file_lock *blocker,
 					struct file_lock *waiter)
 {
-	spin_lock(&blocked_lock_lock);
+	lg_local_lock_cpu(&file_lock_lglock, blocker->fl_link_cpu);
 	__locks_insert_block(blocker, waiter);
-	spin_unlock(&blocked_lock_lock);
+	lg_local_unlock_cpu(&file_lock_lglock, blocker->fl_link_cpu);
 }
 
 /*
@@ -686,31 +690,33 @@ static void locks_insert_block(struct file_lock *blocker,
 static void locks_wake_up_blocks(struct file_lock *blocker)
 {
 	/*
-	 * Avoid taking global lock if list is empty. This is safe since new
+	 * Avoid taking lock if list is empty. This is safe since new
 	 * blocked requests are only added to the list under the flc_lock, and
 	 * the flc_lock is always held here. Note that removal from the fl_block
 	 * list does not require the flc_lock, so we must recheck list_empty()
-	 * after acquiring the blocked_lock_lock.
+	 * after acquiring the file_lock_lglock.
 	 */
 	if (list_empty(&blocker->fl_block))
 		return;
 
-	spin_lock(&blocked_lock_lock);
+	lg_local_lock_cpu(&file_lock_lglock, blocker->fl_link_cpu);
 	while (!list_empty(&blocker->fl_block)) {
 		struct file_lock *waiter;
 
 		waiter = list_first_entry(&blocker->fl_block,
 				struct file_lock, fl_block);
-		if (IS_POSIX(blocker))
+		if (IS_POSIX(blocker)) {
+			spin_lock(&blocked_lock_lock);
 			__locks_delete_posix_block(waiter);
-		else
+			spin_unlock(&blocked_lock_lock);
+		} else
 			__locks_delete_block(waiter);
 		if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
 			waiter->fl_lmops->lm_notify(waiter);
 		else
 			wake_up(&waiter->fl_wait);
 	}
-	spin_unlock(&blocked_lock_lock);
+	lg_local_unlock_cpu(&file_lock_lglock, blocker->fl_link_cpu);
 }
 
 static void
@@ -737,9 +743,11 @@ static void
 locks_delete_lock_ctx(struct file_lock *fl, struct list_head *dispose)
 {
 	locks_unlink_lock_ctx(fl);
-	if (dispose)
+	if (dispose) {
+		lg_local_lock_cpu(&file_lock_lglock, fl->fl_link_cpu);
 		list_add(&fl->fl_list, dispose);
-	else
+		lg_local_unlock_cpu(&file_lock_lglock, fl->fl_link_cpu);
+	} else
 		locks_free_lock(fl);
 }
 
@@ -1011,12 +1019,14 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 			 * locks list must be done while holding the same lock!
 			 */
 			error = -EDEADLK;
+			lg_local_lock_cpu(&file_lock_lglock, fl->fl_link_cpu);
 			spin_lock(&blocked_lock_lock);
 			if (likely(!posix_locks_deadlock(request, fl))) {
 				error = FILE_LOCK_DEFERRED;
 				__locks_insert_posix_block(fl, request);
 			}
 			spin_unlock(&blocked_lock_lock);
+			lg_local_unlock_cpu(&file_lock_lglock, fl->fl_link_cpu);
 			goto out;
   		}
   	}
@@ -2497,12 +2507,14 @@ posix_unblock_lock(struct file_lock *waiter)
 {
 	int status = 0;
 
+	lg_local_lock_cpu(&file_lock_lglock, waiter->fl_link_cpu);
 	spin_lock(&blocked_lock_lock);
 	if (waiter->fl_next)
 		__locks_delete_posix_block(waiter);
 	else
 		status = -ENOENT;
 	spin_unlock(&blocked_lock_lock);
+	lg_local_unlock_cpu(&file_lock_lglock, waiter->fl_link_cpu);
 	return status;
 }
 EXPORT_SYMBOL(posix_unblock_lock);
@@ -2629,13 +2641,11 @@ static int locks_show(struct seq_file *f, void *v)
 }
 
 static void *locks_start(struct seq_file *f, loff_t *pos)
-	__acquires(&blocked_lock_lock)
 {
 	struct locks_iterator *iter = f->private;
 
 	iter->li_pos = *pos + 1;
 	lg_global_lock(&file_lock_lglock);
-	spin_lock(&blocked_lock_lock);
 	return seq_hlist_start_percpu(&file_lock_list, &iter->li_cpu, *pos);
 }
 
@@ -2648,9 +2658,7 @@ static void *locks_next(struct seq_file *f, void *v, loff_t *pos)
 }
 
 static void locks_stop(struct seq_file *f, void *v)
-	__releases(&blocked_lock_lock)
 {
-	spin_unlock(&blocked_lock_lock);
 	lg_global_unlock(&file_lock_lglock);
 }
 
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC v2 0/4] fs/locks: Use plain percpu spinlocks instead of lglock to protect file_lock
  2015-03-02 14:25 [RFC v2 0/4] fs/locks: Use plain percpu spinlocks instead of lglock to protect file_lock Daniel Wagner
                   ` (3 preceding siblings ...)
  2015-03-02 14:25 ` [RFC v2 4/4] locks: Use blocked_lock_lock only to protect blocked_hash Daniel Wagner
@ 2015-03-02 15:23 ` Jeff Layton
  2015-03-02 16:44   ` Daniel Wagner
  4 siblings, 1 reply; 19+ messages in thread
From: Jeff Layton @ 2015-03-02 15:23 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Jeff Layton, linux-fsdevel, linux-kernel, Alexander Viro,
	J. Bruce Fields

On Mon,  2 Mar 2015 15:25:09 +0100
Daniel Wagner <daniel.wagner@bmw-carit.de> wrote:

> Hi Jeff,
> 
> I've dropped the spinlock conversion for the time beeing. Maybe the
> last patch which changes the usage of blocked_lock_lock is still
> useful. And in case I can convince of the spinlock conversion it can
> easliy done on top of it. I think it makes it also simpler to review
> doing it this after all.
> 
> cheers,
> daniel
> 
> v2:
>  - added a few lockdep assertion
>  - dropped spinlock conversion
>  
> v1:
>  - rebased on v3.19-8975-g3d88348
>  - splittet into smaller pieces
>  - fixed a wrong usage of __locks_insert/delete_block() and it's posix version
>  - added seqfile helpers to avoid ugly open coded version
> 
> 
> Original cover letter:
> 
> I am looking at how to get rid of lglock. Reason being -rt is not too
> happy with that lock, especially that it uses arch_spinlock_t and
> therefore it is not changed into a mutex on -rt. I know no change is
> accepted only fixing something for -rt alone. So here my attempt to
> make things faster for mainline and fixing -rt.
> 
> There are two users of lglock at this point. fs/locks.c and
> kernel/stop_machine.c
> 
> I presume the fs/locks is the more interesting one in respect of
> performance. Let's have a look at that one first.
> 
> The lglock version of file_lock_lock is used in combination of
> blocked_lock_lock to protect file_lock's fl_link, fl_block, fl_next,
> blocked_hash and the percpu file_lock_list.
> 
> The plan is to reorganize the usage of the locks and what they protect
> so that the usage of the global blocked_lock_lock is reduced.
> 
> Whenever we insert a new lock we are going to grab besides the i_lock
> also the corresponding percpu file_lock_lock. The global
> blocked_lock_lock is only used when blocked_hash is involved.
> 
> file_lock_list exists to be being able to produce the content of
> /proc/locks. For listing the all locks it seems a bit excessive to
> grab all locks at once. We should be okay just grabbing the
> corresponding lock when iterating over the percpu file_lock_list.
> 
> file_lock_lock protects now file_lock_list and fl_link, fl_block and
> fl_next allone. That means we need to define which file_lock_lock is
> used for all waiters. Luckely, fl_link_cpu can be reused for fl_block
> and fl_next.
> 
> I haven't found a good way around for the open coded seq_ops
> (locks_start, locks_next, locks_stop). Maybe someone has good idea how
> to handle with the locks.
> 
> For performance testing I used
> git://git.samba.org/jlayton/lockperf.git and for correctness
> https://github.com/linux-test-project/ltp/tree/master/testcases/network/nfsv4/locks
> In case you are missing the posix03 results, my machine doesn't like
> it too much. The load brings it to its knees due to the very high
> load. Propably I need different parameters.
> 
> I didn't run excessive tests so far, because I am waiting for getting
> access on a bigger box compared to my small i7-4850HQ system. I hope
> to see larger improvements when there are more cores involved.
> 
> [...]
> 
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Jeff Layton <jlayton@poochiereds.net>
> Cc: "J. Bruce Fields" <bfields@fieldses.org>
> Cc: linux-fsdevel@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> 
> Daniel Wagner (4):
>   locks: Remove unnecessary IS_POSIX test
>   locks: Add lockdep assertion for blocked_lock_lock
>   locks: Split insert/delete block functions into flock/posix parts
>   locks: Use blocked_lock_lock only to protect blocked_hash
> 
>  fs/locks.c | 124 +++++++++++++++++++++++++++++++++++++++++--------------------
>  1 file changed, 84 insertions(+), 40 deletions(-)
> 

These look good at first glance, but I do need to go over patches 3 and
4 in more detail.

FWIW, usually when I see "RFC" in the subject, I take it as a hint that
this is still work-in-progress and that you're looking for early feedback
on it, and hence they it shouldn't be merged yet. Is that the case
here, or would I be OK to merge these?

-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v2 0/4] fs/locks: Use plain percpu spinlocks instead of lglock to protect file_lock
  2015-03-02 15:23 ` [RFC v2 0/4] fs/locks: Use plain percpu spinlocks instead of lglock to protect file_lock Jeff Layton
@ 2015-03-02 16:44   ` Daniel Wagner
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel Wagner @ 2015-03-02 16:44 UTC (permalink / raw)
  To: Jeff Layton, Daniel Wagner
  Cc: Jeff Layton, linux-fsdevel, linux-kernel, Alexander Viro,
	J. Bruce Fields

On 03/02/2015 04:23 PM, Jeff Layton wrote:
> These look good at first glance, but I do need to go over patches 3 and
> 4 in more detail.
>
> FWIW, usually when I see "RFC" in the subject, I take it as a hint that
> this is still work-in-progress and that you're looking for early feedback
> on it, and hence they it shouldn't be merged yet. Is that the case
> here, or would I be OK to merge these?

I screwed that part over. I wanted to send them as 'PATCH'. Though I 
have planned to do another benchmark round and see if it there isn't any 
problem left in there.

Thanks for going easy on me, first time looking at this end of the kernel :)

cheers,
daniel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v2 3/4] locks: Split insert/delete block functions into flock/posix parts
  2015-03-02 14:25 ` [RFC v2 3/4] locks: Split insert/delete block functions into flock/posix parts Daniel Wagner
@ 2015-03-03  0:55   ` Jeff Layton
  2015-03-04 14:20     ` Daniel Wagner
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff Layton @ 2015-03-03  0:55 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Jeff Layton, linux-fsdevel, linux-kernel, J. Bruce Fields,
	Alexander Viro

On Mon,  2 Mar 2015 15:25:12 +0100
Daniel Wagner <daniel.wagner@bmw-carit.de> wrote:

> The locks_insert/delete_block() functions are used for flock, posix
> and leases types. blocked_lock_lock is used to serialize all access to
> fl_link, fl_block, fl_next and blocked_hash. Here, we prepare the
> stage for using blocked_lock_lock to protect blocked_hash.
> 
> Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
> Cc: Jeff Layton <jlayton@poochiereds.net>
> Cc: "J. Bruce Fields" <bfields@fieldses.org>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> ---
>  fs/locks.c | 48 ++++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 40 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/locks.c b/fs/locks.c
> index 4498da0..02821dd 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -611,11 +611,20 @@ static void locks_delete_global_blocked(struct file_lock *waiter)
>   */
>  static void __locks_delete_block(struct file_lock *waiter)
>  {
> -	locks_delete_global_blocked(waiter);
>  	list_del_init(&waiter->fl_block);
>  	waiter->fl_next = NULL;
>  }
>  
> +/* Posix block variant of __locks_delete_block.
> + *
> + * Must be called with blocked_lock_lock held.
> + */
> +static void __locks_delete_posix_block(struct file_lock *waiter)
> +{
> +	locks_delete_global_blocked(waiter);
> +	__locks_delete_block(waiter);
> +}
> +
>  static void locks_delete_block(struct file_lock *waiter)
>  {
>  	spin_lock(&blocked_lock_lock);
> @@ -623,6 +632,13 @@ static void locks_delete_block(struct file_lock *waiter)
>  	spin_unlock(&blocked_lock_lock);
>  }
>  
> +static void locks_delete_posix_block(struct file_lock *waiter)
> +{
> +	spin_lock(&blocked_lock_lock);
> +	__locks_delete_posix_block(waiter);
> +	spin_unlock(&blocked_lock_lock);
> +}
> +
>  /* Insert waiter into blocker's block list.
>   * We use a circular list so that processes can be easily woken up in
>   * the order they blocked. The documentation doesn't require this but
> @@ -639,7 +655,17 @@ static void __locks_insert_block(struct file_lock *blocker,
>  	BUG_ON(!list_empty(&waiter->fl_block));
>  	waiter->fl_next = blocker;
>  	list_add_tail(&waiter->fl_block, &blocker->fl_block);
> -	if (IS_POSIX(blocker) && !IS_OFDLCK(blocker))
> +}
> +
> +/* Posix block variant of __locks_insert_block.
> + *
> + * Must be called with flc_lock and blocked_lock_lock held.
> + */
> +static void __locks_insert_posix_block(struct file_lock *blocker,
> +					struct file_lock *waiter)
> +{
> +	__locks_insert_block(blocker, waiter);
> +	if (!IS_OFDLCK(blocker))
>  		locks_insert_global_blocked(waiter);
>  }
>

In many ways OFD locks act more like flock locks than POSIX ones. In
particular, there is no deadlock detection there, so once your
conversion is done to more widely use the percpu locks, then you should
be able to avoid taking the blocked_lock_lock for OFD locks as well.
The 4th patch in this series doesn't currently do that.

You may want to revisit this patch such that the IS_OFDLCK checks are
done earlier, so that you can avoid taking the blocked_lock_lock if
IS_POSIX and !IS_OFDLCK.

> @@ -675,7 +701,10 @@ static void locks_wake_up_blocks(struct
> file_lock *blocker) 
>  		waiter = list_first_entry(&blocker->fl_block,
>  				struct file_lock, fl_block);
> -		__locks_delete_block(waiter);
> +		if (IS_POSIX(blocker))

...again, you probably want to make this read:

		if (IS_POSIX(blocker) && !IS_OFDLCK(blocker)

...so that later you can avoid taking the blocked_lock_lock if IS_OFDLOCK(blocker).


> +			__locks_delete_posix_block(waiter);
> +		else
> +			__locks_delete_block(waiter);
>  		if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
>  			waiter->fl_lmops->lm_notify(waiter);
>  		else
> @@ -985,7 +1014,7 @@ static int __posix_lock_file(struct inode
> *inode, struct file_lock *request, str spin_lock(&blocked_lock_lock);
>  			if (likely(!posix_locks_deadlock(request,
> fl))) { error = FILE_LOCK_DEFERRED;
> -				__locks_insert_block(fl, request);
> +				__locks_insert_posix_block(fl,
> request); }
>  			spin_unlock(&blocked_lock_lock);
>  			goto out;
> @@ -1186,7 +1215,7 @@ int posix_lock_file_wait(struct file *filp,
> struct file_lock *fl) if (!error)
>  			continue;
>  
> -		locks_delete_block(fl);
> +		locks_delete_posix_block(fl);
>  		break;
>  	}
>  	return error;
> @@ -1283,7 +1312,7 @@ int locks_mandatory_area(int read_write, struct
> inode *inode, continue;
>  		}
>  
> -		locks_delete_block(&fl);
> +		locks_delete_posix_block(&fl);
>  		break;
>  	}
>  
> @@ -2103,7 +2132,10 @@ static int do_lock_file_wait(struct file
> *filp, unsigned int cmd, if (!error)
>  			continue;
>  
> -		locks_delete_block(fl);
> +		if (IS_POSIX(fl))
> +			locks_delete_posix_block(fl);
> +		else
> +			locks_delete_block(fl);
>  		break;
>  	}
>  
> @@ -2467,7 +2499,7 @@ posix_unblock_lock(struct file_lock *waiter)
>  
>  	spin_lock(&blocked_lock_lock);
>  	if (waiter->fl_next)
> -		__locks_delete_block(waiter);
> +		__locks_delete_posix_block(waiter);
>  	else
>  		status = -ENOENT;
>  	spin_unlock(&blocked_lock_lock);

-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v2 1/4] locks: Remove unnecessary IS_POSIX test
  2015-03-02 14:25 ` [RFC v2 1/4] locks: Remove unnecessary IS_POSIX test Daniel Wagner
@ 2015-03-03  0:55   ` Jeff Layton
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff Layton @ 2015-03-03  0:55 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Jeff Layton, linux-fsdevel, linux-kernel, J. Bruce Fields,
	Alexander Viro

On Mon,  2 Mar 2015 15:25:10 +0100
Daniel Wagner <daniel.wagner@bmw-carit.de> wrote:

> Since following change
> 
> commit bd61e0a9c852de2d705b6f1bb2cc54c5774db570
> Author: Jeff Layton <jlayton@primarydata.com>
> Date:   Fri Jan 16 15:05:55 2015 -0500
> 
>     locks: convert posix locks to file_lock_context
> 
> all Posix locks are kept on their a separate list, so the test is
> redudant.
> 
> Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
> Cc: Jeff Layton <jlayton@primarydata.com>
> Cc: "J. Bruce Fields" <bfields@fieldses.org>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> ---
>  fs/locks.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/fs/locks.c b/fs/locks.c
> index 365c82e..f63aa92 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -964,8 +964,6 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
>  	 */
>  	if (request->fl_type != F_UNLCK) {
>  		list_for_each_entry(fl, &ctx->flc_posix, fl_list) {
> -			if (!IS_POSIX(fl))
> -				continue;
>  			if (!posix_locks_conflict(request, fl))
>  				continue;
>  			if (conflock)

Merged for v4.1.

Thanks!
-- 
Jeff Layton <jeff.layton@primarydata.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v2 2/4] locks: Add lockdep assertion for blocked_lock_lock
  2015-03-02 14:25 ` [RFC v2 2/4] locks: Add lockdep assertion for blocked_lock_lock Daniel Wagner
@ 2015-03-03  0:55   ` Jeff Layton
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff Layton @ 2015-03-03  0:55 UTC (permalink / raw)
  To: Daniel Wagner; +Cc: Jeff Layton, linux-fsdevel, linux-kernel

On Mon,  2 Mar 2015 15:25:11 +0100
Daniel Wagner <daniel.wagner@bmw-carit.de> wrote:

> Annonate insert, remove and iterate function that we need
> blocked_lock_lock held.
> 
> Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
> ---
>  fs/locks.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/fs/locks.c b/fs/locks.c
> index f63aa92..4498da0 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -592,11 +592,15 @@ posix_owner_key(struct file_lock *fl)
>  
>  static void locks_insert_global_blocked(struct file_lock *waiter)
>  {
> +	lockdep_assert_held(&blocked_lock_lock);
> +
>  	hash_add(blocked_hash, &waiter->fl_link, posix_owner_key(waiter));
>  }
>  
>  static void locks_delete_global_blocked(struct file_lock *waiter)
>  {
> +	lockdep_assert_held(&blocked_lock_lock);
> +
>  	hash_del(&waiter->fl_link);
>  }
>  
> @@ -838,6 +842,8 @@ static int posix_locks_deadlock(struct file_lock *caller_fl,
>  {
>  	int i = 0;
>  
> +	lockdep_assert_held(&blocked_lock_lock);
> +
>  	/*
>  	 * This deadlock detector can't reasonably detect deadlocks with
>  	 * FL_OFDLCK locks, since they aren't owned by a process, per-se.

Merged for v4.1.
-- 
Jeff Layton <jeff.layton@primarydata.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v2 4/4] locks: Use blocked_lock_lock only to protect blocked_hash
  2015-03-02 14:25 ` [RFC v2 4/4] locks: Use blocked_lock_lock only to protect blocked_hash Daniel Wagner
@ 2015-03-03  0:58   ` Jeff Layton
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff Layton @ 2015-03-03  0:58 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Jeff Layton, linux-fsdevel, linux-kernel, J. Bruce Fields,
	Alexander Viro

On Mon,  2 Mar 2015 15:25:13 +0100
Daniel Wagner <daniel.wagner@bmw-carit.de> wrote:

> blocked_lock_lock and file_lock_lglock is used to protect file_lock's
> fl_link, fl_block, fl_next, blocked_hash and the percpu
> file_lock_list.
> 
> The plan is to reorganize the usage of the locks and what they protect
> so that the usage of the global blocked_lock_lock is reduced.
> 
> Whenever we insert a new lock we are going to grab besides the
> flc_lock also the corresponding file_lock_lglock. The global
> blocked_lock_lock is only used when blocked_hash is involved.
> 
> file_lock_lglock protects now file_lock_list and fl_link, fl_block and
> fl_next allone. That means we need to define which file_lock_lglock is
> used for all waiters. Luckely, fl_link_cpu can be reused for fl_block
> and fl_next.
> 
> Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
> Cc: Jeff Layton <jlayton@poochiereds.net>
> Cc: "J. Bruce Fields" <bfields@fieldses.org>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> ---
>  fs/locks.c | 78 ++++++++++++++++++++++++++++++++++----------------------------
>  1 file changed, 43 insertions(+), 35 deletions(-)
> 
> diff --git a/fs/locks.c b/fs/locks.c
> index 02821dd..de15ea8 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -162,6 +162,20 @@ int lease_break_time = 45;
>   * keep a list on each CPU, with each list protected by its own spinlock via
>   * the file_lock_lglock. Note that alterations to the list also require that
>   * the relevant flc_lock is held.
> + *
> + * In addition, it also protects the fl->fl_block list, and the fl->fl_next
> + * pointer for file_lock structures that are acting as lock requests (in
> + * contrast to those that are acting as records of acquired locks).
> + *
> + * file_lock structures acting as lock requests (waiters) use the same
> + * spinlock as the those acting as lock holder (blocker). E.g. the
> + * blocker is initially added to the file_lock_list living on CPU 0,
> + * all waiters on that blocker are serialized via CPU 0 (see
> + * fl_link_cpu usage).
> + *
> + * In particular, adding an entry to the fl_block list requires that you hold
> + * both the flc_lock and the blocked_lock_lock (acquired in that order).
> + * Deleting an entry from the list however only requires the file_lock_gllock.
>   */
>  DEFINE_STATIC_LGLOCK(file_lock_lglock);
>  static DEFINE_PER_CPU(struct hlist_head, file_lock_list);
> @@ -183,19 +197,6 @@ static DEFINE_HASHTABLE(blocked_hash, BLOCKED_HASH_BITS);
>  /*
>   * This lock protects the blocked_hash. Generally, if you're accessing it, you
>   * want to be holding this lock.
> - *
> - * In addition, it also protects the fl->fl_block list, and the fl->fl_next
> - * pointer for file_lock structures that are acting as lock requests (in
> - * contrast to those that are acting as records of acquired locks).
> - *
> - * Note that when we acquire this lock in order to change the above fields,
> - * we often hold the flc_lock as well. In certain cases, when reading the fields
> - * protected by this lock, we can skip acquiring it iff we already hold the
> - * flc_lock.
> - *
> - * In particular, adding an entry to the fl_block list requires that you hold
> - * both the flc_lock and the blocked_lock_lock (acquired in that order).
> - * Deleting an entry from the list however only requires the file_lock_lock.
>   */
>  static DEFINE_SPINLOCK(blocked_lock_lock);
>  
> @@ -607,7 +608,7 @@ static void locks_delete_global_blocked(struct file_lock *waiter)
>  /* Remove waiter from blocker's block list.
>   * When blocker ends up pointing to itself then the list is empty.
>   *
> - * Must be called with blocked_lock_lock held.
> + * Must be called with file_lock_lglock held.
>   */
>  static void __locks_delete_block(struct file_lock *waiter)
>  {
> @@ -617,7 +618,7 @@ static void __locks_delete_block(struct file_lock *waiter)
>  
>  /* Posix block variant of __locks_delete_block.
>   *
> - * Must be called with blocked_lock_lock held.
> + * Must be called with file_lock_lglock held.
>   */
>  static void __locks_delete_posix_block(struct file_lock *waiter)
>  {
> @@ -627,16 +628,18 @@ static void __locks_delete_posix_block(struct file_lock *waiter)
>  
>  static void locks_delete_block(struct file_lock *waiter)
>  {
> -	spin_lock(&blocked_lock_lock);
> +	lg_local_lock_cpu(&file_lock_lglock, waiter->fl_link_cpu);
>  	__locks_delete_block(waiter);
> -	spin_unlock(&blocked_lock_lock);
> +	lg_local_unlock_cpu(&file_lock_lglock, waiter->fl_link_cpu);
>  }
>  
>  static void locks_delete_posix_block(struct file_lock *waiter)
>  {
> +	lg_local_lock_cpu(&file_lock_lglock, waiter->fl_link_cpu);
>  	spin_lock(&blocked_lock_lock);
>  	__locks_delete_posix_block(waiter);
>  	spin_unlock(&blocked_lock_lock);
> +	lg_local_unlock_cpu(&file_lock_lglock, waiter->fl_link_cpu);
>  }
>  
>  /* Insert waiter into blocker's block list.
> @@ -644,22 +647,23 @@ static void locks_delete_posix_block(struct file_lock *waiter)
>   * the order they blocked. The documentation doesn't require this but
>   * it seems like the reasonable thing to do.
>   *
> - * Must be called with both the flc_lock and blocked_lock_lock held. The
> - * fl_block list itself is protected by the blocked_lock_lock, but by ensuring
> + * Must be called with both the flc_lock and file_lock_lglock held. The
> + * fl_block list itself is protected by the file_lock_lglock, but by ensuring
>   * that the flc_lock is also held on insertions we can avoid taking the
> - * blocked_lock_lock in some cases when we see that the fl_block list is empty.
> + * file_lock_lglock in some cases when we see that the fl_block list is empty.
>   */
>  static void __locks_insert_block(struct file_lock *blocker,
>  					struct file_lock *waiter)
>  {
>  	BUG_ON(!list_empty(&waiter->fl_block));
> +	waiter->fl_link_cpu = blocker->fl_link_cpu;
>  	waiter->fl_next = blocker;
>  	list_add_tail(&waiter->fl_block, &blocker->fl_block);
>  }
>  
>  /* Posix block variant of __locks_insert_block.
>   *
> - * Must be called with flc_lock and blocked_lock_lock held.
> + * Must be called with flc_lock and file_lock_lglock held.
>   */
>  static void __locks_insert_posix_block(struct file_lock *blocker,
>  					struct file_lock *waiter)
> @@ -673,9 +677,9 @@ static void __locks_insert_posix_block(struct file_lock *blocker,
>  static void locks_insert_block(struct file_lock *blocker,
>  					struct file_lock *waiter)
>  {
> -	spin_lock(&blocked_lock_lock);
> +	lg_local_lock_cpu(&file_lock_lglock, blocker->fl_link_cpu);
>  	__locks_insert_block(blocker, waiter);
> -	spin_unlock(&blocked_lock_lock);
> +	lg_local_unlock_cpu(&file_lock_lglock, blocker->fl_link_cpu);
>  }
>  
>  /*
> @@ -686,31 +690,33 @@ static void locks_insert_block(struct file_lock *blocker,
>  static void locks_wake_up_blocks(struct file_lock *blocker)
>  {
>  	/*
> -	 * Avoid taking global lock if list is empty. This is safe since new
> +	 * Avoid taking lock if list is empty. This is safe since new
>  	 * blocked requests are only added to the list under the flc_lock, and
>  	 * the flc_lock is always held here. Note that removal from the fl_block
>  	 * list does not require the flc_lock, so we must recheck list_empty()
> -	 * after acquiring the blocked_lock_lock.
> +	 * after acquiring the file_lock_lglock.
>  	 */
>  	if (list_empty(&blocker->fl_block))
>  		return;
>  
> -	spin_lock(&blocked_lock_lock);
> +	lg_local_lock_cpu(&file_lock_lglock, blocker->fl_link_cpu);
>  	while (!list_empty(&blocker->fl_block)) {
>  		struct file_lock *waiter;
>  
>  		waiter = list_first_entry(&blocker->fl_block,
>  				struct file_lock, fl_block);
> -		if (IS_POSIX(blocker))
> +		if (IS_POSIX(blocker)) {
> +			spin_lock(&blocked_lock_lock);
>  			__locks_delete_posix_block(waiter);
> -		else
> +			spin_unlock(&blocked_lock_lock);
> +		} else
>  			__locks_delete_block(waiter);

So again, as I tried to point out in the last patch, I think we can do
better here and also avoid taking the blocked_lock_lock when the
blocker is an OFD lock.

>  		if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
>  			waiter->fl_lmops->lm_notify(waiter);
>  		else
>  			wake_up(&waiter->fl_wait);
>  	}
> -	spin_unlock(&blocked_lock_lock);
> +	lg_local_unlock_cpu(&file_lock_lglock, blocker->fl_link_cpu);
>  }
>  
>  static void
> @@ -737,9 +743,11 @@ static void
>  locks_delete_lock_ctx(struct file_lock *fl, struct list_head *dispose)
>  {
>  	locks_unlink_lock_ctx(fl);
> -	if (dispose)
> +	if (dispose) {
> +		lg_local_lock_cpu(&file_lock_lglock, fl->fl_link_cpu);
>  		list_add(&fl->fl_list, dispose);
> -	else
> +		lg_local_unlock_cpu(&file_lock_lglock, fl->fl_link_cpu);
> +	} else
>  		locks_free_lock(fl);
>  }
>  
> @@ -1011,12 +1019,14 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
>  			 * locks list must be done while holding the same lock!
>  			 */
>  			error = -EDEADLK;
> +			lg_local_lock_cpu(&file_lock_lglock, fl->fl_link_cpu);
>  			spin_lock(&blocked_lock_lock);
>  			if (likely(!posix_locks_deadlock(request, fl))) {
>  				error = FILE_LOCK_DEFERRED;
>  				__locks_insert_posix_block(fl, request);
>  			}
>  			spin_unlock(&blocked_lock_lock);
> +			lg_local_unlock_cpu(&file_lock_lglock, fl->fl_link_cpu);
>  			goto out;
>    		}
>    	}
> @@ -2497,12 +2507,14 @@ posix_unblock_lock(struct file_lock *waiter)
>  {
>  	int status = 0;
>  
> +	lg_local_lock_cpu(&file_lock_lglock, waiter->fl_link_cpu);
>  	spin_lock(&blocked_lock_lock);
>  	if (waiter->fl_next)
>  		__locks_delete_posix_block(waiter);
>  	else
>  		status = -ENOENT;
>  	spin_unlock(&blocked_lock_lock);
> +	lg_local_unlock_cpu(&file_lock_lglock, waiter->fl_link_cpu);
>  	return status;
>  }
>  EXPORT_SYMBOL(posix_unblock_lock);
> @@ -2629,13 +2641,11 @@ static int locks_show(struct seq_file *f, void *v)
>  }
>  
>  static void *locks_start(struct seq_file *f, loff_t *pos)
> -	__acquires(&blocked_lock_lock)
>  {
>  	struct locks_iterator *iter = f->private;
>  
>  	iter->li_pos = *pos + 1;
>  	lg_global_lock(&file_lock_lglock);
> -	spin_lock(&blocked_lock_lock);
>  	return seq_hlist_start_percpu(&file_lock_list, &iter->li_cpu, *pos);
>  }
>  
> @@ -2648,9 +2658,7 @@ static void *locks_next(struct seq_file *f, void *v, loff_t *pos)
>  }
>  
>  static void locks_stop(struct seq_file *f, void *v)
> -	__releases(&blocked_lock_lock)
>  {
> -	spin_unlock(&blocked_lock_lock);
>  	lg_global_unlock(&file_lock_lglock);
>  }
>  


-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v2 3/4] locks: Split insert/delete block functions into flock/posix parts
  2015-03-03  0:55   ` Jeff Layton
@ 2015-03-04 14:20     ` Daniel Wagner
  2015-03-04 15:00       ` Boaz Harrosh
  2015-03-04 21:01       ` Jeff Layton
  0 siblings, 2 replies; 19+ messages in thread
From: Daniel Wagner @ 2015-03-04 14:20 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Jeff Layton, linux-fsdevel, linux-kernel, J. Bruce Fields,
	Alexander Viro

On 03/03/2015 01:55 AM, Jeff Layton wrote:
> On Mon,  2 Mar 2015 15:25:12 +0100
> Daniel Wagner <daniel.wagner@bmw-carit.de> wrote:
> 
>> The locks_insert/delete_block() functions are used for flock, posix
>> and leases types. blocked_lock_lock is used to serialize all access to
>> fl_link, fl_block, fl_next and blocked_hash. Here, we prepare the
>> stage for using blocked_lock_lock to protect blocked_hash.
>>
>> Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
>> Cc: Jeff Layton <jlayton@poochiereds.net>
>> Cc: "J. Bruce Fields" <bfields@fieldses.org>
>> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
>> ---
>>  fs/locks.c | 48 ++++++++++++++++++++++++++++++++++++++++--------
>>  1 file changed, 40 insertions(+), 8 deletions(-)
>>
>> diff --git a/fs/locks.c b/fs/locks.c
>> index 4498da0..02821dd 100644
>> --- a/fs/locks.c
>> +++ b/fs/locks.c
>> @@ -611,11 +611,20 @@ static void locks_delete_global_blocked(struct file_lock *waiter)
>>   */
>>  static void __locks_delete_block(struct file_lock *waiter)
>>  {
>> -	locks_delete_global_blocked(waiter);
>>  	list_del_init(&waiter->fl_block);
>>  	waiter->fl_next = NULL;
>>  }
>>  
>> +/* Posix block variant of __locks_delete_block.
>> + *
>> + * Must be called with blocked_lock_lock held.
>> + */
>> +static void __locks_delete_posix_block(struct file_lock *waiter)
>> +{
>> +	locks_delete_global_blocked(waiter);
>> +	__locks_delete_block(waiter);
>> +}
>> +
>>  static void locks_delete_block(struct file_lock *waiter)
>>  {
>>  	spin_lock(&blocked_lock_lock);
>> @@ -623,6 +632,13 @@ static void locks_delete_block(struct file_lock *waiter)
>>  	spin_unlock(&blocked_lock_lock);
>>  }
>>  
>> +static void locks_delete_posix_block(struct file_lock *waiter)
>> +{
>> +	spin_lock(&blocked_lock_lock);
>> +	__locks_delete_posix_block(waiter);
>> +	spin_unlock(&blocked_lock_lock);
>> +}
>> +
>>  /* Insert waiter into blocker's block list.
>>   * We use a circular list so that processes can be easily woken up in
>>   * the order they blocked. The documentation doesn't require this but
>> @@ -639,7 +655,17 @@ static void __locks_insert_block(struct file_lock *blocker,
>>  	BUG_ON(!list_empty(&waiter->fl_block));
>>  	waiter->fl_next = blocker;
>>  	list_add_tail(&waiter->fl_block, &blocker->fl_block);
>> -	if (IS_POSIX(blocker) && !IS_OFDLCK(blocker))
>> +}
>> +
>> +/* Posix block variant of __locks_insert_block.
>> + *
>> + * Must be called with flc_lock and blocked_lock_lock held.
>> + */
>> +static void __locks_insert_posix_block(struct file_lock *blocker,
>> +					struct file_lock *waiter)
>> +{
>> +	__locks_insert_block(blocker, waiter);
>> +	if (!IS_OFDLCK(blocker))
>>  		locks_insert_global_blocked(waiter);
>>  }
>>
> 
> In many ways OFD locks act more like flock locks than POSIX ones. In
> particular, there is no deadlock detection there, so once your
> conversion is done to more widely use the percpu locks, then you should
> be able to avoid taking the blocked_lock_lock for OFD locks as well.
> The 4th patch in this series doesn't currently do that.
> 
> You may want to revisit this patch such that the IS_OFDLCK checks are
> done earlier, so that you can avoid taking the blocked_lock_lock if
> IS_POSIX and !IS_OFDLCK.

Thanks for the explanation. I was not entirely sure what to do here
and forgot to ask.

I have fixed that stuff and now I am testing it. Though it seems
that there is a memory leak which can be triggered with 

	while true; rm -rf /tmp/a; ./lease02 /tmp/a; done

and this happens also without any of my patches. Still trying to
figure out what's happening. Hopefully I just see a ghost.

slabtop tells me that ftrace_event_field is constantly growing:

 Active / Total Objects (% used)    : 968819303 / 968828665 (100.0%)
 Active / Total Slabs (% used)      : 11404623 / 11404623 (100.0%)
 Active / Total Caches (% used)     : 72 / 99 (72.7%)
 Active / Total Size (% used)       : 45616199.68K / 45619608.73K (100.0%)
 Minimum / Average / Maximum Object : 0.01K / 0.05K / 16.00K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                   
967510630 967510630   2%    0.05K 11382478       85  45529912K ftrace_event_field
154368 154368 100%    0.03K   1206      128      4824K kmalloc-32
121856 121856 100%    0.01K    238      512       952K kmalloc-8
121227 121095  99%    0.08K   2377       51      9508K Acpi-State

This is on proper hardware. On a kvm guest, fasync_cache grows fast and finally the
guest runs out of memory. systemd tries hard to restart everything and fails constantly:

[  187.021758] systemd invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0
[  187.022337] systemd cpuset=/ mems_allowed=0
[  187.022662] CPU: 3 PID: 1 Comm: systemd Not tainted 4.0.0-rc1+ #380
[  187.023117] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014
[  187.023801]  ffff88007c918000 ffff88007c9179c8 ffffffff81b4f9be ffffffff8116a9cc
[  187.024373]  0000000000000000 ffff88007c917a88 ffffffff8116a9d1 000000007c917a58
[  187.024940]  ffffffff8224bc98 ffff88007c917a28 0000000000000092 ffffffff81c1b780
[  187.025515] Call Trace:
[  187.025698]  [<ffffffff81b4f9be>] dump_stack+0x4c/0x65
[  187.026083]  [<ffffffff8116a9cc>] ? dump_header.isra.13+0x7c/0x450
[  187.026525]  [<ffffffff8116a9d1>] dump_header.isra.13+0x81/0x450
[  187.026958]  [<ffffffff810a45c6>] ? trace_hardirqs_on_caller+0x16/0x240
[  187.027437]  [<ffffffff810a47fd>] ? trace_hardirqs_on+0xd/0x10
[  187.027859]  [<ffffffff814fe5c4>] ? ___ratelimit+0x84/0x110
[  187.028264]  [<ffffffff8116b378>] oom_kill_process+0x1e8/0x4c0
[  187.028683]  [<ffffffff8105fda5>] ? has_ns_capability_noaudit+0x5/0x170
[  187.029167]  [<ffffffff8116baf4>] __out_of_memory+0x4a4/0x510
[  187.029579]  [<ffffffff8116bd2b>] out_of_memory+0x5b/0x80
[  187.029970]  [<ffffffff81170f2e>] __alloc_pages_nodemask+0xa0e/0xb60
[  187.030434]  [<ffffffff811ad863>] read_swap_cache_async+0xe3/0x180
[  187.030881]  [<ffffffff811ad9ed>] swapin_readahead+0xed/0x190
[  187.031300]  [<ffffffff8119bcae>] handle_mm_fault+0xbbe/0x1180
[  187.031719]  [<ffffffff81046bed>] __do_page_fault+0x1ed/0x4c0
[  187.032138]  [<ffffffff81046ecc>] do_page_fault+0xc/0x10
[  187.032520]  [<ffffffff81b5ddc2>] page_fault+0x22/0x30
[  187.032889] Mem-Info:
[  187.033066] DMA per-cpu:
[  187.033254] CPU    0: hi:    0, btch:   1 usd:   0
[  187.033596] CPU    1: hi:    0, btch:   1 usd:   0
[  187.033941] CPU    2: hi:    0, btch:   1 usd:   0
[  187.034292] CPU    3: hi:    0, btch:   1 usd:   0
[  187.034637] DMA32 per-cpu:
[  187.034837] CPU    0: hi:  186, btch:  31 usd:  51
[  187.035185] CPU    1: hi:  186, btch:  31 usd:   0
[  187.035529] CPU    2: hi:  186, btch:  31 usd:   0
[  187.035873] CPU    3: hi:  186, btch:  31 usd:  32
[  187.036221] active_anon:5 inactive_anon:0 isolated_anon:0
[  187.036221]  active_file:238 inactive_file:194 isolated_file:0
[  187.036221]  unevictable:0 dirty:0 writeback:8 unstable:0
[  187.036221]  free:3361 slab_reclaimable:4651 slab_unreclaimable:493909
[  187.036221]  mapped:347 shmem:0 pagetables:400 bounce:0
[  187.036221]  free_cma:0
[  187.038385] DMA free:7848kB min:44kB low:52kB high:64kB active_anon:4kB inactive_anon:12kB active_file:0kB inactive_file:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:8kB mapped:4kB shmem:0kB slab_reclaimable:12kB slab_unreclaimable:7880kB kernel_stack:32kB pagetables:36kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:132 all_unreclaimable? yes
[  187.041138] lowmem_reserve[]: 0 1952 1952 1952
[  187.041510] DMA32 free:5596kB min:5628kB low:7032kB high:8440kB active_anon:16kB inactive_anon:0kB active_file:952kB inactive_file:772kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080640kB managed:2004912kB mlocked:0kB dirty:0kB writeback:24kB mapped:1384kB shmem:0kB slab_reclaimable:18592kB slab_unreclaimable:1967756kB kernel_stack:1968kB pagetables:1564kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:12716 all_unreclaimable? yes
[  187.044442] lowmem_reserve[]: 0 0 0 0
[  187.044756] DMA: 4*4kB (UEM) 2*8kB (UM) 5*16kB (UEM) 2*32kB (UE) 2*64kB (EM) 3*128kB (UEM) 2*256kB (EM) 3*512kB (UEM) 3*1024kB (UEM) 1*2048kB (R) 0*4096kB = 7856kB
[  187.046022] DMA32: 190*4kB (UER) 6*8kB (R) 1*16kB (R) 1*32kB (R) 0*64kB 0*128kB 1*256kB (R) 1*512kB (R) 0*1024kB 0*2048kB 1*4096kB (R) = 5720kB
[  187.047128] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[  187.047724] 554 total pagecache pages
[  187.047991] 60 pages in swap cache
[  187.048259] Swap cache stats: add 102769, delete 102709, find 75688/136761
[  187.048748] Free swap  = 1041456kB
[  187.048995] Total swap = 1048572kB
[  187.049250] 524158 pages RAM
[  187.049463] 0 pages HighMem/MovableOnly
[  187.049739] 18953 pages reserved
[  187.049974] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[  187.050587] [ 1293]     0  1293    10283        1      23       2      131         -1000 systemd-udevd
[  187.051253] [ 1660]     0  1660    12793       57      24       2      134         -1000 auditd
[  187.051872] [ 1681]    81  1681     6637        1      18       2      124          -900 dbus-daemon
[  187.052529] [ 1725]     0  1725    20707        0      42       3      216         -1000 sshd
[  187.053146] [ 2344]     0  2344     3257        0      11       2       49             0 systemd-cgroups
[  187.053820] [ 2345]     0  2345     3257        0      11       2       55             0 systemd-cgroups
[  187.054497] [ 2350]     0  2350     3257        0      11       2       35             0 systemd-cgroups
[  187.055175] [ 2352]     0  2352     3257        0      12       2       37             0 systemd-cgroups
[  187.055846] [ 2354]     0  2354     3257        0      11       2       43             0 systemd-cgroups
[  187.056530] [ 2355]     0  2355     3257        0      11       2       40             0 systemd-cgroups
[  187.057212] [ 2356]     0  2356     3257        0      11       2       44             0 systemd-cgroups
[  187.057886] [ 2362]     0  2362     3257        0      11       3       33             0 systemd-cgroups
[  187.058564] [ 2371]     0  2371     3257        0      11       2       33             0 systemd-cgroups
[  187.059244] [ 2372]     0  2372     3257        0      10       2       44             0 systemd-cgroups
[  187.059917] [ 2373]     0  2373     3257        0      11       2       39             0 systemd-cgroups
[  187.060600] [ 2376]     0  2376     3257        0      11       2       34             0 systemd-cgroups
[  187.061280] [ 2377]     0  2377     3257        0      10       2       43             0 systemd-cgroups
[  187.061942] [ 2378]     0  2378     3257        0      12       3       34             0 systemd-cgroups
[  187.062598] [ 2379]     0  2379    27502        0      10       3       33             0 agetty
[  187.063200] [ 2385]     0  2385     3257        0      12       2       44             0 systemd-cgroups
[  187.063859] [ 2390]     0  2390     3257        0      11       2       43             0 systemd-cgroups
[  187.064520] [ 2394]     0  2394     3257        0      11       2       41             0 systemd-cgroups
[  187.065182] [ 2397]     0  2397     3257        0      11       2       43             0 systemd-cgroups
[  187.065833] [ 2402]     0  2402     3257        0      11       2       42             0 systemd-cgroups
[  187.066490] [ 2403]     0  2403     3257        0      11       2       44             0 systemd-cgroups
[  187.067148] [ 2404]     0  2404    27502        0      13       3       30             0 agetty
[  187.067743] [ 2410]     0  2410     3257        0      11       2       43             0 systemd-cgroups
[  187.068407] [ 2413]     0  2413     3257        0      11       2       36             0 systemd-cgroups
[  187.069072] [ 2416]     0  2416     3257        0      11       2       49             0 systemd-cgroups
[  187.069720] [ 2417]     0  2417    11861      173      26       2      334             0 (journald)
[  187.070368] Out of memory: Kill process 2417 ((journald)) score 0 or sacrifice child
[  187.070943] Killed process 2417 ((journald)) total-vm:47444kB, anon-rss:0kB, file-rss:692kB
[  187.513857] systemd[1]: Unit systemd-logind.service entered failed state.
[  188.262477] systemd[1]: Unit systemd-journald.service entered failed state.
[  188.315222] systemd[1]: systemd-logind.service holdoff time over, scheduling restart.
[  188.334194] systemd[1]: Stopping Login Service...
[  188.341556] systemd[1]: Starting Login Service...
[  188.408787] systemd[1]: systemd-journald.service holdoff time over, scheduling restart.
[  189.284506] systemd[1]: Stopping Journal Service...
[  189.330806] systemd[1]: Starting Journal Service...
[  189.384800] systemd[1]: Started Journal Service.


cheers,
daniel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v2 3/4] locks: Split insert/delete block functions into flock/posix parts
  2015-03-04 14:20     ` Daniel Wagner
@ 2015-03-04 15:00       ` Boaz Harrosh
  2015-03-04 15:32         ` Daniel Wagner
  2015-03-04 21:01       ` Jeff Layton
  1 sibling, 1 reply; 19+ messages in thread
From: Boaz Harrosh @ 2015-03-04 15:00 UTC (permalink / raw)
  To: Daniel Wagner, Jeff Layton
  Cc: Jeff Layton, linux-fsdevel, linux-kernel, J. Bruce Fields,
	Alexander Viro

On 03/04/2015 04:20 PM, Daniel Wagner wrote:
> On 03/03/2015 01:55 AM, Jeff Layton wrote:
>> On Mon,  2 Mar 2015 15:25:12 +0100
>> Daniel Wagner <daniel.wagner@bmw-carit.de> wrote:
>>
<>
> I have fixed that stuff and now I am testing it. Though it seems
> that there is a memory leak which can be triggered with 
> 
> 	while true; rm -rf /tmp/a; ./lease02 /tmp/a; done
> 
> and this happens also without any of my patches. Still trying to
> figure out what's happening. Hopefully I just see a ghost.
> 
> slabtop tells me that ftrace_event_field is constantly growing:
> 

check out the Kernel's leak detector it is perfect in showing you
what was the exact call stack of the leaked memory.

(Tell me if you need the exact Kconfig key to enable it should not
 be hard to find)

Cheers
Boaz


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v2 3/4] locks: Split insert/delete block functions into flock/posix parts
  2015-03-04 15:00       ` Boaz Harrosh
@ 2015-03-04 15:32         ` Daniel Wagner
  2015-03-04 17:59           ` Jeff Layton
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel Wagner @ 2015-03-04 15:32 UTC (permalink / raw)
  To: Boaz Harrosh, Jeff Layton
  Cc: Jeff Layton, linux-fsdevel, linux-kernel, J. Bruce Fields,
	Alexander Viro

On 03/04/2015 04:00 PM, Boaz Harrosh wrote:
> On 03/04/2015 04:20 PM, Daniel Wagner wrote:
>> On 03/03/2015 01:55 AM, Jeff Layton wrote:
>>> On Mon,  2 Mar 2015 15:25:12 +0100
>>> Daniel Wagner <daniel.wagner@bmw-carit.de> wrote:
>>>
> <>
>> I have fixed that stuff and now I am testing it. Though it seems
>> that there is a memory leak which can be triggered with 
>>
>> 	while true; rm -rf /tmp/a; ./lease02 /tmp/a; done
>>
>> and this happens also without any of my patches. Still trying to
>> figure out what's happening. Hopefully I just see a ghost.
>>
>> slabtop tells me that ftrace_event_field is constantly growing:
>>
> 
> check out the Kernel's leak detector it is perfect in showing you
> what was the exact call stack of the leaked memory.

Thanks for the tip. Will use it in future :)

I have done a quick bisect limit the search on fs/locks.c.
I suspect that the file_lock_context refactoring is the source of the leak.
bisect agrees with me


8634b51f6ca298fb8b07aa4847340764903533ab is the first bad commit
commit 8634b51f6ca298fb8b07aa4847340764903533ab
Author: Jeff Layton <jlayton@primarydata.com>
Date:   Fri Jan 16 15:05:55 2015 -0500

    locks: convert lease handling to file_lock_context
    
    Signed-off-by: Jeff Layton <jlayton@primarydata.com>
    Acked-by: Christoph Hellwig <hch@lst.de>

:040000 040000 4114db9392dc4dadb30664b71a954321e5e87bab 5b9abbaf1808a7c926c09fa2164044e0cc26fd54 M      fs
:040000 040000 bd569f527a195edf673c4f7d0e80bf356c7f8d1b 6362646e04dd83efc1a9e92877900797ac879e9a M      include


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v2 3/4] locks: Split insert/delete block functions into flock/posix parts
  2015-03-04 15:32         ` Daniel Wagner
@ 2015-03-04 17:59           ` Jeff Layton
  2015-03-04 19:16             ` Jeff Layton
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff Layton @ 2015-03-04 17:59 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Boaz Harrosh, Jeff Layton, linux-fsdevel, linux-kernel,
	J. Bruce Fields, Alexander Viro

On Wed, 4 Mar 2015 16:32:57 +0100
Daniel Wagner <daniel.wagner@bmw-carit.de> wrote:

> On 03/04/2015 04:00 PM, Boaz Harrosh wrote:
> > On 03/04/2015 04:20 PM, Daniel Wagner wrote:
> >> On 03/03/2015 01:55 AM, Jeff Layton wrote:
> >>> On Mon,  2 Mar 2015 15:25:12 +0100
> >>> Daniel Wagner <daniel.wagner@bmw-carit.de> wrote:
> >>>
> > <>
> >> I have fixed that stuff and now I am testing it. Though it seems
> >> that there is a memory leak which can be triggered with 
> >>
> >> 	while true; rm -rf /tmp/a; ./lease02 /tmp/a; done
> >>
> >> and this happens also without any of my patches. Still trying to
> >> figure out what's happening. Hopefully I just see a ghost.
> >>
> >> slabtop tells me that ftrace_event_field is constantly growing:
> >>
> > 
> > check out the Kernel's leak detector it is perfect in showing you
> > what was the exact call stack of the leaked memory.
> 
> Thanks for the tip. Will use it in future :)
> 
> I have done a quick bisect limit the search on fs/locks.c.
> I suspect that the file_lock_context refactoring is the source of the leak.
> bisect agrees with me
> 
> 
> 8634b51f6ca298fb8b07aa4847340764903533ab is the first bad commit
> commit 8634b51f6ca298fb8b07aa4847340764903533ab
> Author: Jeff Layton <jlayton@primarydata.com>
> Date:   Fri Jan 16 15:05:55 2015 -0500
> 
>     locks: convert lease handling to file_lock_context
>     
>     Signed-off-by: Jeff Layton <jlayton@primarydata.com>
>     Acked-by: Christoph Hellwig <hch@lst.de>
> 
> :040000 040000 4114db9392dc4dadb30664b71a954321e5e87bab 5b9abbaf1808a7c926c09fa2164044e0cc26fd54 M      fs
> :040000 040000 bd569f527a195edf673c4f7d0e80bf356c7f8d1b 6362646e04dd83efc1a9e92877900797ac879e9a M      include
> 

Thanks. I'll take a look.

-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v2 3/4] locks: Split insert/delete block functions into flock/posix parts
  2015-03-04 17:59           ` Jeff Layton
@ 2015-03-04 19:16             ` Jeff Layton
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff Layton @ 2015-03-04 19:16 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Boaz Harrosh, Jeff Layton, linux-fsdevel, linux-kernel,
	J. Bruce Fields, Alexander Viro

On Wed, 4 Mar 2015 12:59:23 -0500
Jeff Layton <jlayton@poochiereds.net> wrote:

> On Wed, 4 Mar 2015 16:32:57 +0100
> Daniel Wagner <daniel.wagner@bmw-carit.de> wrote:
> 
> > On 03/04/2015 04:00 PM, Boaz Harrosh wrote:
> > > On 03/04/2015 04:20 PM, Daniel Wagner wrote:
> > >> On 03/03/2015 01:55 AM, Jeff Layton wrote:
> > >>> On Mon,  2 Mar 2015 15:25:12 +0100
> > >>> Daniel Wagner <daniel.wagner@bmw-carit.de> wrote:
> > >>>
> > > <>
> > >> I have fixed that stuff and now I am testing it. Though it seems
> > >> that there is a memory leak which can be triggered with 
> > >>
> > >> 	while true; rm -rf /tmp/a; ./lease02 /tmp/a; done
> > >>
> > >> and this happens also without any of my patches. Still trying to
> > >> figure out what's happening. Hopefully I just see a ghost.
> > >>
> > >> slabtop tells me that ftrace_event_field is constantly growing:
> > >>
> > > 
> > > check out the Kernel's leak detector it is perfect in showing you
> > > what was the exact call stack of the leaked memory.
> > 
> > Thanks for the tip. Will use it in future :)
> > 
> > I have done a quick bisect limit the search on fs/locks.c.
> > I suspect that the file_lock_context refactoring is the source of the leak.
> > bisect agrees with me
> > 
> > 
> > 8634b51f6ca298fb8b07aa4847340764903533ab is the first bad commit
> > commit 8634b51f6ca298fb8b07aa4847340764903533ab
> > Author: Jeff Layton <jlayton@primarydata.com>
> > Date:   Fri Jan 16 15:05:55 2015 -0500
> > 
> >     locks: convert lease handling to file_lock_context
> >     
> >     Signed-off-by: Jeff Layton <jlayton@primarydata.com>
> >     Acked-by: Christoph Hellwig <hch@lst.de>
> > 
> > :040000 040000 4114db9392dc4dadb30664b71a954321e5e87bab 5b9abbaf1808a7c926c09fa2164044e0cc26fd54 M      fs
> > :040000 040000 bd569f527a195edf673c4f7d0e80bf356c7f8d1b 6362646e04dd83efc1a9e92877900797ac879e9a M      include
> > 
> 
> Thanks. I'll take a look.
> 

Huh. I'm was a bit surprised by this as I didn't really touch how the
fasync entries get handled. I added a bit of printk debugging
(primitive, I know...) and I see this:

[  458.715319] lease_modify: calling fasync_helper on ffff880035a942d0

So, the fasync_helper getting called on the fasync entry, but it's
definitely not getting freed. When I look at the object in the
debugger, it looks like call_rcu has been called on it though:

  fa_file = 0x0, 
  fa_rcu = {
    next = 0xffff8800ccd6a8a0, 
    func = 0xffffffff8122b1c0 <fasync_free_rcu>
  }

...it's almost like the rcu grace period isn't ending properly? I'll
keep poking at though to see if I can figure out what's going wrong.

-- 
Jeff Layton <jeff.layton@primarydata.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v2 3/4] locks: Split insert/delete block functions into flock/posix parts
  2015-03-04 14:20     ` Daniel Wagner
  2015-03-04 15:00       ` Boaz Harrosh
@ 2015-03-04 21:01       ` Jeff Layton
  2015-03-04 21:12         ` Jeff Layton
  2015-03-04 21:13         ` Daniel Wagner
  1 sibling, 2 replies; 19+ messages in thread
From: Jeff Layton @ 2015-03-04 21:01 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Jeff Layton, linux-fsdevel, linux-kernel, J. Bruce Fields,
	Alexander Viro

On Wed, 4 Mar 2015 15:20:33 +0100
Daniel Wagner <daniel.wagner@bmw-carit.de> wrote:

> On 03/03/2015 01:55 AM, Jeff Layton wrote:
> > On Mon,  2 Mar 2015 15:25:12 +0100
> > Daniel Wagner <daniel.wagner@bmw-carit.de> wrote:
> > 
> >> The locks_insert/delete_block() functions are used for flock, posix
> >> and leases types. blocked_lock_lock is used to serialize all access to
> >> fl_link, fl_block, fl_next and blocked_hash. Here, we prepare the
> >> stage for using blocked_lock_lock to protect blocked_hash.
> >>
> >> Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
> >> Cc: Jeff Layton <jlayton@poochiereds.net>
> >> Cc: "J. Bruce Fields" <bfields@fieldses.org>
> >> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> >> ---
> >>  fs/locks.c | 48 ++++++++++++++++++++++++++++++++++++++++--------
> >>  1 file changed, 40 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/fs/locks.c b/fs/locks.c
> >> index 4498da0..02821dd 100644
> >> --- a/fs/locks.c
> >> +++ b/fs/locks.c
> >> @@ -611,11 +611,20 @@ static void locks_delete_global_blocked(struct file_lock *waiter)
> >>   */
> >>  static void __locks_delete_block(struct file_lock *waiter)
> >>  {
> >> -	locks_delete_global_blocked(waiter);
> >>  	list_del_init(&waiter->fl_block);
> >>  	waiter->fl_next = NULL;
> >>  }
> >>  
> >> +/* Posix block variant of __locks_delete_block.
> >> + *
> >> + * Must be called with blocked_lock_lock held.
> >> + */
> >> +static void __locks_delete_posix_block(struct file_lock *waiter)
> >> +{
> >> +	locks_delete_global_blocked(waiter);
> >> +	__locks_delete_block(waiter);
> >> +}
> >> +
> >>  static void locks_delete_block(struct file_lock *waiter)
> >>  {
> >>  	spin_lock(&blocked_lock_lock);
> >> @@ -623,6 +632,13 @@ static void locks_delete_block(struct file_lock *waiter)
> >>  	spin_unlock(&blocked_lock_lock);
> >>  }
> >>  
> >> +static void locks_delete_posix_block(struct file_lock *waiter)
> >> +{
> >> +	spin_lock(&blocked_lock_lock);
> >> +	__locks_delete_posix_block(waiter);
> >> +	spin_unlock(&blocked_lock_lock);
> >> +}
> >> +
> >>  /* Insert waiter into blocker's block list.
> >>   * We use a circular list so that processes can be easily woken up in
> >>   * the order they blocked. The documentation doesn't require this but
> >> @@ -639,7 +655,17 @@ static void __locks_insert_block(struct file_lock *blocker,
> >>  	BUG_ON(!list_empty(&waiter->fl_block));
> >>  	waiter->fl_next = blocker;
> >>  	list_add_tail(&waiter->fl_block, &blocker->fl_block);
> >> -	if (IS_POSIX(blocker) && !IS_OFDLCK(blocker))
> >> +}
> >> +
> >> +/* Posix block variant of __locks_insert_block.
> >> + *
> >> + * Must be called with flc_lock and blocked_lock_lock held.
> >> + */
> >> +static void __locks_insert_posix_block(struct file_lock *blocker,
> >> +					struct file_lock *waiter)
> >> +{
> >> +	__locks_insert_block(blocker, waiter);
> >> +	if (!IS_OFDLCK(blocker))
> >>  		locks_insert_global_blocked(waiter);
> >>  }
> >>
> > 
> > In many ways OFD locks act more like flock locks than POSIX ones. In
> > particular, there is no deadlock detection there, so once your
> > conversion is done to more widely use the percpu locks, then you should
> > be able to avoid taking the blocked_lock_lock for OFD locks as well.
> > The 4th patch in this series doesn't currently do that.
> > 
> > You may want to revisit this patch such that the IS_OFDLCK checks are
> > done earlier, so that you can avoid taking the blocked_lock_lock if
> > IS_POSIX and !IS_OFDLCK.
> 
> Thanks for the explanation. I was not entirely sure what to do here
> and forgot to ask.
> 
> I have fixed that stuff and now I am testing it. Though it seems
> that there is a memory leak which can be triggered with 
> 
> 	while true; rm -rf /tmp/a; ./lease02 /tmp/a; done
> 
> and this happens also without any of my patches. Still trying to
> figure out what's happening. Hopefully I just see a ghost.
> 
> slabtop tells me that ftrace_event_field is constantly growing:
> 
>  Active / Total Objects (% used)    : 968819303 / 968828665 (100.0%)
>  Active / Total Slabs (% used)      : 11404623 / 11404623 (100.0%)
>  Active / Total Caches (% used)     : 72 / 99 (72.7%)
>  Active / Total Size (% used)       : 45616199.68K / 45619608.73K (100.0%)
>  Minimum / Average / Maximum Object : 0.01K / 0.05K / 16.00K
> 
>   OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                   
> 967510630 967510630   2%    0.05K 11382478       85  45529912K ftrace_event_field
> 154368 154368 100%    0.03K   1206      128      4824K kmalloc-32
> 121856 121856 100%    0.01K    238      512       952K kmalloc-8
> 121227 121095  99%    0.08K   2377       51      9508K Acpi-State
> 
> This is on proper hardware. On a kvm guest, fasync_cache grows fast and finally the
> guest runs out of memory. systemd tries hard to restart everything and fails constantly:
> 
> [  187.021758] systemd invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0
> [  187.022337] systemd cpuset=/ mems_allowed=0
> [  187.022662] CPU: 3 PID: 1 Comm: systemd Not tainted 4.0.0-rc1+ #380
> [  187.023117] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014
> [  187.023801]  ffff88007c918000 ffff88007c9179c8 ffffffff81b4f9be ffffffff8116a9cc
> [  187.024373]  0000000000000000 ffff88007c917a88 ffffffff8116a9d1 000000007c917a58
> [  187.024940]  ffffffff8224bc98 ffff88007c917a28 0000000000000092 ffffffff81c1b780
> [  187.025515] Call Trace:
> [  187.025698]  [<ffffffff81b4f9be>] dump_stack+0x4c/0x65
> [  187.026083]  [<ffffffff8116a9cc>] ? dump_header.isra.13+0x7c/0x450
> [  187.026525]  [<ffffffff8116a9d1>] dump_header.isra.13+0x81/0x450
> [  187.026958]  [<ffffffff810a45c6>] ? trace_hardirqs_on_caller+0x16/0x240
> [  187.027437]  [<ffffffff810a47fd>] ? trace_hardirqs_on+0xd/0x10
> [  187.027859]  [<ffffffff814fe5c4>] ? ___ratelimit+0x84/0x110
> [  187.028264]  [<ffffffff8116b378>] oom_kill_process+0x1e8/0x4c0
> [  187.028683]  [<ffffffff8105fda5>] ? has_ns_capability_noaudit+0x5/0x170
> [  187.029167]  [<ffffffff8116baf4>] __out_of_memory+0x4a4/0x510
> [  187.029579]  [<ffffffff8116bd2b>] out_of_memory+0x5b/0x80
> [  187.029970]  [<ffffffff81170f2e>] __alloc_pages_nodemask+0xa0e/0xb60
> [  187.030434]  [<ffffffff811ad863>] read_swap_cache_async+0xe3/0x180
> [  187.030881]  [<ffffffff811ad9ed>] swapin_readahead+0xed/0x190
> [  187.031300]  [<ffffffff8119bcae>] handle_mm_fault+0xbbe/0x1180
> [  187.031719]  [<ffffffff81046bed>] __do_page_fault+0x1ed/0x4c0
> [  187.032138]  [<ffffffff81046ecc>] do_page_fault+0xc/0x10
> [  187.032520]  [<ffffffff81b5ddc2>] page_fault+0x22/0x30
> [  187.032889] Mem-Info:
> [  187.033066] DMA per-cpu:
> [  187.033254] CPU    0: hi:    0, btch:   1 usd:   0
> [  187.033596] CPU    1: hi:    0, btch:   1 usd:   0
> [  187.033941] CPU    2: hi:    0, btch:   1 usd:   0
> [  187.034292] CPU    3: hi:    0, btch:   1 usd:   0
> [  187.034637] DMA32 per-cpu:
> [  187.034837] CPU    0: hi:  186, btch:  31 usd:  51
> [  187.035185] CPU    1: hi:  186, btch:  31 usd:   0
> [  187.035529] CPU    2: hi:  186, btch:  31 usd:   0
> [  187.035873] CPU    3: hi:  186, btch:  31 usd:  32
> [  187.036221] active_anon:5 inactive_anon:0 isolated_anon:0
> [  187.036221]  active_file:238 inactive_file:194 isolated_file:0
> [  187.036221]  unevictable:0 dirty:0 writeback:8 unstable:0
> [  187.036221]  free:3361 slab_reclaimable:4651 slab_unreclaimable:493909
> [  187.036221]  mapped:347 shmem:0 pagetables:400 bounce:0
> [  187.036221]  free_cma:0
> [  187.038385] DMA free:7848kB min:44kB low:52kB high:64kB active_anon:4kB inactive_anon:12kB active_file:0kB inactive_file:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:8kB mapped:4kB shmem:0kB slab_reclaimable:12kB slab_unreclaimable:7880kB kernel_stack:32kB pagetables:36kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:132 all_unreclaimable? yes
> [  187.041138] lowmem_reserve[]: 0 1952 1952 1952
> [  187.041510] DMA32 free:5596kB min:5628kB low:7032kB high:8440kB active_anon:16kB inactive_anon:0kB active_file:952kB inactive_file:772kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080640kB managed:2004912kB mlocked:0kB dirty:0kB writeback:24kB mapped:1384kB shmem:0kB slab_reclaimable:18592kB slab_unreclaimable:1967756kB kernel_stack:1968kB pagetables:1564kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:12716 all_unreclaimable? yes
> [  187.044442] lowmem_reserve[]: 0 0 0 0
> [  187.044756] DMA: 4*4kB (UEM) 2*8kB (UM) 5*16kB (UEM) 2*32kB (UE) 2*64kB (EM) 3*128kB (UEM) 2*256kB (EM) 3*512kB (UEM) 3*1024kB (UEM) 1*2048kB (R) 0*4096kB = 7856kB
> [  187.046022] DMA32: 190*4kB (UER) 6*8kB (R) 1*16kB (R) 1*32kB (R) 0*64kB 0*128kB 1*256kB (R) 1*512kB (R) 0*1024kB 0*2048kB 1*4096kB (R) = 5720kB
> [  187.047128] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> [  187.047724] 554 total pagecache pages
> [  187.047991] 60 pages in swap cache
> [  187.048259] Swap cache stats: add 102769, delete 102709, find 75688/136761
> [  187.048748] Free swap  = 1041456kB
> [  187.048995] Total swap = 1048572kB
> [  187.049250] 524158 pages RAM
> [  187.049463] 0 pages HighMem/MovableOnly
> [  187.049739] 18953 pages reserved
> [  187.049974] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
> [  187.050587] [ 1293]     0  1293    10283        1      23       2      131         -1000 systemd-udevd
> [  187.051253] [ 1660]     0  1660    12793       57      24       2      134         -1000 auditd
> [  187.051872] [ 1681]    81  1681     6637        1      18       2      124          -900 dbus-daemon
> [  187.052529] [ 1725]     0  1725    20707        0      42       3      216         -1000 sshd
> [  187.053146] [ 2344]     0  2344     3257        0      11       2       49             0 systemd-cgroups
> [  187.053820] [ 2345]     0  2345     3257        0      11       2       55             0 systemd-cgroups
> [  187.054497] [ 2350]     0  2350     3257        0      11       2       35             0 systemd-cgroups
> [  187.055175] [ 2352]     0  2352     3257        0      12       2       37             0 systemd-cgroups
> [  187.055846] [ 2354]     0  2354     3257        0      11       2       43             0 systemd-cgroups
> [  187.056530] [ 2355]     0  2355     3257        0      11       2       40             0 systemd-cgroups
> [  187.057212] [ 2356]     0  2356     3257        0      11       2       44             0 systemd-cgroups
> [  187.057886] [ 2362]     0  2362     3257        0      11       3       33             0 systemd-cgroups
> [  187.058564] [ 2371]     0  2371     3257        0      11       2       33             0 systemd-cgroups
> [  187.059244] [ 2372]     0  2372     3257        0      10       2       44             0 systemd-cgroups
> [  187.059917] [ 2373]     0  2373     3257        0      11       2       39             0 systemd-cgroups
> [  187.060600] [ 2376]     0  2376     3257        0      11       2       34             0 systemd-cgroups
> [  187.061280] [ 2377]     0  2377     3257        0      10       2       43             0 systemd-cgroups
> [  187.061942] [ 2378]     0  2378     3257        0      12       3       34             0 systemd-cgroups
> [  187.062598] [ 2379]     0  2379    27502        0      10       3       33             0 agetty
> [  187.063200] [ 2385]     0  2385     3257        0      12       2       44             0 systemd-cgroups
> [  187.063859] [ 2390]     0  2390     3257        0      11       2       43             0 systemd-cgroups
> [  187.064520] [ 2394]     0  2394     3257        0      11       2       41             0 systemd-cgroups
> [  187.065182] [ 2397]     0  2397     3257        0      11       2       43             0 systemd-cgroups
> [  187.065833] [ 2402]     0  2402     3257        0      11       2       42             0 systemd-cgroups
> [  187.066490] [ 2403]     0  2403     3257        0      11       2       44             0 systemd-cgroups
> [  187.067148] [ 2404]     0  2404    27502        0      13       3       30             0 agetty
> [  187.067743] [ 2410]     0  2410     3257        0      11       2       43             0 systemd-cgroups
> [  187.068407] [ 2413]     0  2413     3257        0      11       2       36             0 systemd-cgroups
> [  187.069072] [ 2416]     0  2416     3257        0      11       2       49             0 systemd-cgroups
> [  187.069720] [ 2417]     0  2417    11861      173      26       2      334             0 (journald)
> [  187.070368] Out of memory: Kill process 2417 ((journald)) score 0 or sacrifice child
> [  187.070943] Killed process 2417 ((journald)) total-vm:47444kB, anon-rss:0kB, file-rss:692kB
> [  187.513857] systemd[1]: Unit systemd-logind.service entered failed state.
> [  188.262477] systemd[1]: Unit systemd-journald.service entered failed state.
> [  188.315222] systemd[1]: systemd-logind.service holdoff time over, scheduling restart.
> [  188.334194] systemd[1]: Stopping Login Service...
> [  188.341556] systemd[1]: Starting Login Service...
> [  188.408787] systemd[1]: systemd-journald.service holdoff time over, scheduling restart.
> [  189.284506] systemd[1]: Stopping Journal Service...
> [  189.330806] systemd[1]: Starting Journal Service...
> [  189.384800] systemd[1]: Started Journal Service.
> 
> 
> cheers,
> daniel
> 

I pulled down the most recent Fedora rawhide kernel today:

    4.0.0-0.rc2.git0.1.fc23.x86_64

...and with that, I can't reproduce this. The ftrace_event_field slab
(which is shared by the fasync_struct cache) seems to stay under
control. I see it hover around 3-4M in size while the test is running
but the box isn't falling over or anything.

Perhaps this was an MM or RCU bug that is now fixed? Can you confirm
whether you're still able to reproduce it with the most recent mainline
kernels?

-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v2 3/4] locks: Split insert/delete block functions into flock/posix parts
  2015-03-04 21:01       ` Jeff Layton
@ 2015-03-04 21:12         ` Jeff Layton
  2015-03-04 21:13         ` Daniel Wagner
  1 sibling, 0 replies; 19+ messages in thread
From: Jeff Layton @ 2015-03-04 21:12 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Jeff Layton, linux-fsdevel, linux-kernel, J. Bruce Fields,
	Alexander Viro

On Wed, 4 Mar 2015 16:01:36 -0500
Jeff Layton <jlayton@poochiereds.net> wrote:

> On Wed, 4 Mar 2015 15:20:33 +0100
> Daniel Wagner <daniel.wagner@bmw-carit.de> wrote:
> 
> > On 03/03/2015 01:55 AM, Jeff Layton wrote:
> > > On Mon,  2 Mar 2015 15:25:12 +0100
> > > Daniel Wagner <daniel.wagner@bmw-carit.de> wrote:
> > > 
> > >> The locks_insert/delete_block() functions are used for flock, posix
> > >> and leases types. blocked_lock_lock is used to serialize all access to
> > >> fl_link, fl_block, fl_next and blocked_hash. Here, we prepare the
> > >> stage for using blocked_lock_lock to protect blocked_hash.
> > >>
> > >> Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
> > >> Cc: Jeff Layton <jlayton@poochiereds.net>
> > >> Cc: "J. Bruce Fields" <bfields@fieldses.org>
> > >> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> > >> ---
> > >>  fs/locks.c | 48 ++++++++++++++++++++++++++++++++++++++++--------
> > >>  1 file changed, 40 insertions(+), 8 deletions(-)
> > >>
> > >> diff --git a/fs/locks.c b/fs/locks.c
> > >> index 4498da0..02821dd 100644
> > >> --- a/fs/locks.c
> > >> +++ b/fs/locks.c
> > >> @@ -611,11 +611,20 @@ static void locks_delete_global_blocked(struct file_lock *waiter)
> > >>   */
> > >>  static void __locks_delete_block(struct file_lock *waiter)
> > >>  {
> > >> -	locks_delete_global_blocked(waiter);
> > >>  	list_del_init(&waiter->fl_block);
> > >>  	waiter->fl_next = NULL;
> > >>  }
> > >>  
> > >> +/* Posix block variant of __locks_delete_block.
> > >> + *
> > >> + * Must be called with blocked_lock_lock held.
> > >> + */
> > >> +static void __locks_delete_posix_block(struct file_lock *waiter)
> > >> +{
> > >> +	locks_delete_global_blocked(waiter);
> > >> +	__locks_delete_block(waiter);
> > >> +}
> > >> +
> > >>  static void locks_delete_block(struct file_lock *waiter)
> > >>  {
> > >>  	spin_lock(&blocked_lock_lock);
> > >> @@ -623,6 +632,13 @@ static void locks_delete_block(struct file_lock *waiter)
> > >>  	spin_unlock(&blocked_lock_lock);
> > >>  }
> > >>  
> > >> +static void locks_delete_posix_block(struct file_lock *waiter)
> > >> +{
> > >> +	spin_lock(&blocked_lock_lock);
> > >> +	__locks_delete_posix_block(waiter);
> > >> +	spin_unlock(&blocked_lock_lock);
> > >> +}
> > >> +
> > >>  /* Insert waiter into blocker's block list.
> > >>   * We use a circular list so that processes can be easily woken up in
> > >>   * the order they blocked. The documentation doesn't require this but
> > >> @@ -639,7 +655,17 @@ static void __locks_insert_block(struct file_lock *blocker,
> > >>  	BUG_ON(!list_empty(&waiter->fl_block));
> > >>  	waiter->fl_next = blocker;
> > >>  	list_add_tail(&waiter->fl_block, &blocker->fl_block);
> > >> -	if (IS_POSIX(blocker) && !IS_OFDLCK(blocker))
> > >> +}
> > >> +
> > >> +/* Posix block variant of __locks_insert_block.
> > >> + *
> > >> + * Must be called with flc_lock and blocked_lock_lock held.
> > >> + */
> > >> +static void __locks_insert_posix_block(struct file_lock *blocker,
> > >> +					struct file_lock *waiter)
> > >> +{
> > >> +	__locks_insert_block(blocker, waiter);
> > >> +	if (!IS_OFDLCK(blocker))
> > >>  		locks_insert_global_blocked(waiter);
> > >>  }
> > >>
> > > 
> > > In many ways OFD locks act more like flock locks than POSIX ones. In
> > > particular, there is no deadlock detection there, so once your
> > > conversion is done to more widely use the percpu locks, then you should
> > > be able to avoid taking the blocked_lock_lock for OFD locks as well.
> > > The 4th patch in this series doesn't currently do that.
> > > 
> > > You may want to revisit this patch such that the IS_OFDLCK checks are
> > > done earlier, so that you can avoid taking the blocked_lock_lock if
> > > IS_POSIX and !IS_OFDLCK.
> > 
> > Thanks for the explanation. I was not entirely sure what to do here
> > and forgot to ask.
> > 
> > I have fixed that stuff and now I am testing it. Though it seems
> > that there is a memory leak which can be triggered with 
> > 
> > 	while true; rm -rf /tmp/a; ./lease02 /tmp/a; done
> > 
> > and this happens also without any of my patches. Still trying to
> > figure out what's happening. Hopefully I just see a ghost.
> > 
> > slabtop tells me that ftrace_event_field is constantly growing:
> > 
> >  Active / Total Objects (% used)    : 968819303 / 968828665 (100.0%)
> >  Active / Total Slabs (% used)      : 11404623 / 11404623 (100.0%)
> >  Active / Total Caches (% used)     : 72 / 99 (72.7%)
> >  Active / Total Size (% used)       : 45616199.68K / 45619608.73K (100.0%)
> >  Minimum / Average / Maximum Object : 0.01K / 0.05K / 16.00K
> > 
> >   OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                   
> > 967510630 967510630   2%    0.05K 11382478       85  45529912K ftrace_event_field
> > 154368 154368 100%    0.03K   1206      128      4824K kmalloc-32
> > 121856 121856 100%    0.01K    238      512       952K kmalloc-8
> > 121227 121095  99%    0.08K   2377       51      9508K Acpi-State
> > 
> > This is on proper hardware. On a kvm guest, fasync_cache grows fast and finally the
> > guest runs out of memory. systemd tries hard to restart everything and fails constantly:
> > 
> > [  187.021758] systemd invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0
> > [  187.022337] systemd cpuset=/ mems_allowed=0
> > [  187.022662] CPU: 3 PID: 1 Comm: systemd Not tainted 4.0.0-rc1+ #380
> > [  187.023117] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014
> > [  187.023801]  ffff88007c918000 ffff88007c9179c8 ffffffff81b4f9be ffffffff8116a9cc
> > [  187.024373]  0000000000000000 ffff88007c917a88 ffffffff8116a9d1 000000007c917a58
> > [  187.024940]  ffffffff8224bc98 ffff88007c917a28 0000000000000092 ffffffff81c1b780
> > [  187.025515] Call Trace:
> > [  187.025698]  [<ffffffff81b4f9be>] dump_stack+0x4c/0x65
> > [  187.026083]  [<ffffffff8116a9cc>] ? dump_header.isra.13+0x7c/0x450
> > [  187.026525]  [<ffffffff8116a9d1>] dump_header.isra.13+0x81/0x450
> > [  187.026958]  [<ffffffff810a45c6>] ? trace_hardirqs_on_caller+0x16/0x240
> > [  187.027437]  [<ffffffff810a47fd>] ? trace_hardirqs_on+0xd/0x10
> > [  187.027859]  [<ffffffff814fe5c4>] ? ___ratelimit+0x84/0x110
> > [  187.028264]  [<ffffffff8116b378>] oom_kill_process+0x1e8/0x4c0
> > [  187.028683]  [<ffffffff8105fda5>] ? has_ns_capability_noaudit+0x5/0x170
> > [  187.029167]  [<ffffffff8116baf4>] __out_of_memory+0x4a4/0x510
> > [  187.029579]  [<ffffffff8116bd2b>] out_of_memory+0x5b/0x80
> > [  187.029970]  [<ffffffff81170f2e>] __alloc_pages_nodemask+0xa0e/0xb60
> > [  187.030434]  [<ffffffff811ad863>] read_swap_cache_async+0xe3/0x180
> > [  187.030881]  [<ffffffff811ad9ed>] swapin_readahead+0xed/0x190
> > [  187.031300]  [<ffffffff8119bcae>] handle_mm_fault+0xbbe/0x1180
> > [  187.031719]  [<ffffffff81046bed>] __do_page_fault+0x1ed/0x4c0
> > [  187.032138]  [<ffffffff81046ecc>] do_page_fault+0xc/0x10
> > [  187.032520]  [<ffffffff81b5ddc2>] page_fault+0x22/0x30
> > [  187.032889] Mem-Info:
> > [  187.033066] DMA per-cpu:
> > [  187.033254] CPU    0: hi:    0, btch:   1 usd:   0
> > [  187.033596] CPU    1: hi:    0, btch:   1 usd:   0
> > [  187.033941] CPU    2: hi:    0, btch:   1 usd:   0
> > [  187.034292] CPU    3: hi:    0, btch:   1 usd:   0
> > [  187.034637] DMA32 per-cpu:
> > [  187.034837] CPU    0: hi:  186, btch:  31 usd:  51
> > [  187.035185] CPU    1: hi:  186, btch:  31 usd:   0
> > [  187.035529] CPU    2: hi:  186, btch:  31 usd:   0
> > [  187.035873] CPU    3: hi:  186, btch:  31 usd:  32
> > [  187.036221] active_anon:5 inactive_anon:0 isolated_anon:0
> > [  187.036221]  active_file:238 inactive_file:194 isolated_file:0
> > [  187.036221]  unevictable:0 dirty:0 writeback:8 unstable:0
> > [  187.036221]  free:3361 slab_reclaimable:4651 slab_unreclaimable:493909
> > [  187.036221]  mapped:347 shmem:0 pagetables:400 bounce:0
> > [  187.036221]  free_cma:0
> > [  187.038385] DMA free:7848kB min:44kB low:52kB high:64kB active_anon:4kB inactive_anon:12kB active_file:0kB inactive_file:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:8kB mapped:4kB shmem:0kB slab_reclaimable:12kB slab_unreclaimable:7880kB kernel_stack:32kB pagetables:36kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:132 all_unreclaimable? yes
> > [  187.041138] lowmem_reserve[]: 0 1952 1952 1952
> > [  187.041510] DMA32 free:5596kB min:5628kB low:7032kB high:8440kB active_anon:16kB inactive_anon:0kB active_file:952kB inactive_file:772kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080640kB managed:2004912kB mlocked:0kB dirty:0kB writeback:24kB mapped:1384kB shmem:0kB slab_reclaimable:18592kB slab_unreclaimable:1967756kB kernel_stack:1968kB pagetables:1564kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:12716 all_unreclaimable? yes
> > [  187.044442] lowmem_reserve[]: 0 0 0 0
> > [  187.044756] DMA: 4*4kB (UEM) 2*8kB (UM) 5*16kB (UEM) 2*32kB (UE) 2*64kB (EM) 3*128kB (UEM) 2*256kB (EM) 3*512kB (UEM) 3*1024kB (UEM) 1*2048kB (R) 0*4096kB = 7856kB
> > [  187.046022] DMA32: 190*4kB (UER) 6*8kB (R) 1*16kB (R) 1*32kB (R) 0*64kB 0*128kB 1*256kB (R) 1*512kB (R) 0*1024kB 0*2048kB 1*4096kB (R) = 5720kB
> > [  187.047128] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> > [  187.047724] 554 total pagecache pages
> > [  187.047991] 60 pages in swap cache
> > [  187.048259] Swap cache stats: add 102769, delete 102709, find 75688/136761
> > [  187.048748] Free swap  = 1041456kB
> > [  187.048995] Total swap = 1048572kB
> > [  187.049250] 524158 pages RAM
> > [  187.049463] 0 pages HighMem/MovableOnly
> > [  187.049739] 18953 pages reserved
> > [  187.049974] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
> > [  187.050587] [ 1293]     0  1293    10283        1      23       2      131         -1000 systemd-udevd
> > [  187.051253] [ 1660]     0  1660    12793       57      24       2      134         -1000 auditd
> > [  187.051872] [ 1681]    81  1681     6637        1      18       2      124          -900 dbus-daemon
> > [  187.052529] [ 1725]     0  1725    20707        0      42       3      216         -1000 sshd
> > [  187.053146] [ 2344]     0  2344     3257        0      11       2       49             0 systemd-cgroups
> > [  187.053820] [ 2345]     0  2345     3257        0      11       2       55             0 systemd-cgroups
> > [  187.054497] [ 2350]     0  2350     3257        0      11       2       35             0 systemd-cgroups
> > [  187.055175] [ 2352]     0  2352     3257        0      12       2       37             0 systemd-cgroups
> > [  187.055846] [ 2354]     0  2354     3257        0      11       2       43             0 systemd-cgroups
> > [  187.056530] [ 2355]     0  2355     3257        0      11       2       40             0 systemd-cgroups
> > [  187.057212] [ 2356]     0  2356     3257        0      11       2       44             0 systemd-cgroups
> > [  187.057886] [ 2362]     0  2362     3257        0      11       3       33             0 systemd-cgroups
> > [  187.058564] [ 2371]     0  2371     3257        0      11       2       33             0 systemd-cgroups
> > [  187.059244] [ 2372]     0  2372     3257        0      10       2       44             0 systemd-cgroups
> > [  187.059917] [ 2373]     0  2373     3257        0      11       2       39             0 systemd-cgroups
> > [  187.060600] [ 2376]     0  2376     3257        0      11       2       34             0 systemd-cgroups
> > [  187.061280] [ 2377]     0  2377     3257        0      10       2       43             0 systemd-cgroups
> > [  187.061942] [ 2378]     0  2378     3257        0      12       3       34             0 systemd-cgroups
> > [  187.062598] [ 2379]     0  2379    27502        0      10       3       33             0 agetty
> > [  187.063200] [ 2385]     0  2385     3257        0      12       2       44             0 systemd-cgroups
> > [  187.063859] [ 2390]     0  2390     3257        0      11       2       43             0 systemd-cgroups
> > [  187.064520] [ 2394]     0  2394     3257        0      11       2       41             0 systemd-cgroups
> > [  187.065182] [ 2397]     0  2397     3257        0      11       2       43             0 systemd-cgroups
> > [  187.065833] [ 2402]     0  2402     3257        0      11       2       42             0 systemd-cgroups
> > [  187.066490] [ 2403]     0  2403     3257        0      11       2       44             0 systemd-cgroups
> > [  187.067148] [ 2404]     0  2404    27502        0      13       3       30             0 agetty
> > [  187.067743] [ 2410]     0  2410     3257        0      11       2       43             0 systemd-cgroups
> > [  187.068407] [ 2413]     0  2413     3257        0      11       2       36             0 systemd-cgroups
> > [  187.069072] [ 2416]     0  2416     3257        0      11       2       49             0 systemd-cgroups
> > [  187.069720] [ 2417]     0  2417    11861      173      26       2      334             0 (journald)
> > [  187.070368] Out of memory: Kill process 2417 ((journald)) score 0 or sacrifice child
> > [  187.070943] Killed process 2417 ((journald)) total-vm:47444kB, anon-rss:0kB, file-rss:692kB
> > [  187.513857] systemd[1]: Unit systemd-logind.service entered failed state.
> > [  188.262477] systemd[1]: Unit systemd-journald.service entered failed state.
> > [  188.315222] systemd[1]: systemd-logind.service holdoff time over, scheduling restart.
> > [  188.334194] systemd[1]: Stopping Login Service...
> > [  188.341556] systemd[1]: Starting Login Service...
> > [  188.408787] systemd[1]: systemd-journald.service holdoff time over, scheduling restart.
> > [  189.284506] systemd[1]: Stopping Journal Service...
> > [  189.330806] systemd[1]: Starting Journal Service...
> > [  189.384800] systemd[1]: Started Journal Service.
> > 
> > 
> > cheers,
> > daniel
> > 
> 
> I pulled down the most recent Fedora rawhide kernel today:
> 
>     4.0.0-0.rc2.git0.1.fc23.x86_64
> 
> ...and with that, I can't reproduce this. The ftrace_event_field slab
> (which is shared by the fasync_struct cache) seems to stay under
> control. I see it hover around 3-4M in size while the test is running
> but the box isn't falling over or anything.
> 
> Perhaps this was an MM or RCU bug that is now fixed? Can you confirm
> whether you're still able to reproduce it with the most recent mainline
> kernels?
> 

Oh! I take it back. I was testing with your patched lease02 test. With
the unpatched one I can definitely still reproduce it. That suggests
that this problem is occurring when we go to clean up leases when
closing the file. I'll have a close look at that code.

-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v2 3/4] locks: Split insert/delete block functions into flock/posix parts
  2015-03-04 21:01       ` Jeff Layton
  2015-03-04 21:12         ` Jeff Layton
@ 2015-03-04 21:13         ` Daniel Wagner
  1 sibling, 0 replies; 19+ messages in thread
From: Daniel Wagner @ 2015-03-04 21:13 UTC (permalink / raw)
  To: Jeff Layton, Daniel Wagner
  Cc: Jeff Layton, linux-fsdevel, linux-kernel, J. Bruce Fields,
	Alexander Viro

On 03/04/2015 10:01 PM, Jeff Layton wrote:
> I pulled down the most recent Fedora rawhide kernel today:
>
>      4.0.0-0.rc2.git0.1.fc23.x86_64
>
> ...and with that, I can't reproduce this. The ftrace_event_field slab
> (which is shared by the fasync_struct cache) seems to stay under
> control. I see it hover around 3-4M in size while the test is running
> but the box isn't falling over or anything.
>
> Perhaps this was an MM or RCU bug that is now fixed? Can you confirm
> whether you're still able to reproduce it with the most recent mainline
> kernels?

Sure, I'll give tomorrow a spin.



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-03-04 21:13 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-02 14:25 [RFC v2 0/4] fs/locks: Use plain percpu spinlocks instead of lglock to protect file_lock Daniel Wagner
2015-03-02 14:25 ` [RFC v2 1/4] locks: Remove unnecessary IS_POSIX test Daniel Wagner
2015-03-03  0:55   ` Jeff Layton
2015-03-02 14:25 ` [RFC v2 2/4] locks: Add lockdep assertion for blocked_lock_lock Daniel Wagner
2015-03-03  0:55   ` Jeff Layton
2015-03-02 14:25 ` [RFC v2 3/4] locks: Split insert/delete block functions into flock/posix parts Daniel Wagner
2015-03-03  0:55   ` Jeff Layton
2015-03-04 14:20     ` Daniel Wagner
2015-03-04 15:00       ` Boaz Harrosh
2015-03-04 15:32         ` Daniel Wagner
2015-03-04 17:59           ` Jeff Layton
2015-03-04 19:16             ` Jeff Layton
2015-03-04 21:01       ` Jeff Layton
2015-03-04 21:12         ` Jeff Layton
2015-03-04 21:13         ` Daniel Wagner
2015-03-02 14:25 ` [RFC v2 4/4] locks: Use blocked_lock_lock only to protect blocked_hash Daniel Wagner
2015-03-03  0:58   ` Jeff Layton
2015-03-02 15:23 ` [RFC v2 0/4] fs/locks: Use plain percpu spinlocks instead of lglock to protect file_lock Jeff Layton
2015-03-02 16:44   ` Daniel Wagner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.