[PATCH 0/10 v2] audit: Fix various races when tagging and untagging mounts

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/10 v2] audit: Fix various races when tagging and untagging mounts
@ 2018-07-10 10:02 Jan Kara
  2018-07-10 10:02 ` [PATCH 01/10] audit_tree: Remove mark->lock locking Jan Kara
                   ` (10 more replies)
  0 siblings, 11 replies; 32+ messages in thread
From: Jan Kara @ 2018-07-10 10:02 UTC (permalink / raw)
  To: linux-audit
  Cc: Paul Moore, linux-fsdevel, Al Viro, Richard Guy Briggs,
	Amir Goldstein, Jan Kara

Hello,

this is a second revision of the series that addresses problems I have
identified when trying to understand how exactly is kernel/audit_tree.c using
generic fsnotify framework. I hope I have understood all the interactions right
but careful review is certainly welcome.

The patches have been tested by a stress test I have written which mounts &
unmounts filesystems in the directory tree while adding and removing audit
rules for this tree in parallel and accessing the tree to generate events.
Still some real-world testing would be welcome.

Changes since v1:
* Split the last patch to ease review
* Rewrite test script so that it can be included in audit testsuite
* Some cleanups and improvements suggested by Amir

								Honza

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 01/10] audit_tree: Remove mark->lock locking
  2018-07-10 10:02 [PATCH 0/10 v2] audit: Fix various races when tagging and untagging mounts Jan Kara
@ 2018-07-10 10:02 ` Jan Kara
  2018-07-27  4:47   ` Paul Moore
  2018-07-10 10:02 ` [PATCH 02/10] audit: Fix possible spurious -ENOSPC error Jan Kara
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Jan Kara @ 2018-07-10 10:02 UTC (permalink / raw)
  To: linux-audit
  Cc: Paul Moore, linux-fsdevel, Al Viro, Richard Guy Briggs,
	Amir Goldstein, Jan Kara

Currently, audit_tree code uses mark->lock to protect against detaching
of mark from an inode. In most places it however also uses
mark->group->mark_mutex (as we need to atomically replace attached
marks) and this provides protection against mark detaching as well. So
just remove protection with mark->lock from audit tree code and replace
it with mark->group->mark_mutex protection in all the places. It
simplifies the code and gets rid of some ugly catches like calling
fsnotify_add_mark_locked() with mark->lock held (which cannot sleep only
because we hold a reference to another mark attached to the same inode).

Signed-off-by: Jan Kara <jack@suse.cz>
---
 kernel/audit_tree.c | 24 ++++--------------------
 1 file changed, 4 insertions(+), 20 deletions(-)

diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index 02feef939560..1c82eb6674c4 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -193,7 +193,7 @@ static inline struct list_head *chunk_hash(unsigned long key)
 	return chunk_hash_heads + n % HASH_SIZE;
 }
 
-/* hash_lock & entry->lock is held by caller */
+/* hash_lock & entry->group->mark_mutex is held by caller */
 static void insert_hash(struct audit_chunk *chunk)
 {
 	unsigned long key = chunk_to_key(chunk);
@@ -256,13 +256,11 @@ static void untag_chunk(struct node *p)
 		new = alloc_chunk(size);
 
 	mutex_lock(&entry->group->mark_mutex);
-	spin_lock(&entry->lock);
 	/*
 	 * mark_mutex protects mark from getting detached and thus also from
 	 * mark->connector->obj getting NULL.
 	 */
 	if (chunk->dead || !(entry->flags & FSNOTIFY_MARK_FLAG_ATTACHED)) {
-		spin_unlock(&entry->lock);
 		mutex_unlock(&entry->group->mark_mutex);
 		if (new)
 			fsnotify_put_mark(&new->mark);
@@ -280,7 +278,6 @@ static void untag_chunk(struct node *p)
 		list_del_init(&p->list);
 		list_del_rcu(&chunk->hash);
 		spin_unlock(&hash_lock);
-		spin_unlock(&entry->lock);
 		mutex_unlock(&entry->group->mark_mutex);
 		fsnotify_destroy_mark(entry, audit_tree_group);
 		goto out;
@@ -323,7 +320,6 @@ static void untag_chunk(struct node *p)
 	list_for_each_entry(owner, &new->trees, same_root)
 		owner->root = new;
 	spin_unlock(&hash_lock);
-	spin_unlock(&entry->lock);
 	mutex_unlock(&entry->group->mark_mutex);
 	fsnotify_destroy_mark(entry, audit_tree_group);
 	fsnotify_put_mark(&new->mark);	/* drop initial reference */
@@ -340,7 +336,6 @@ static void untag_chunk(struct node *p)
 	p->owner = NULL;
 	put_tree(owner);
 	spin_unlock(&hash_lock);
-	spin_unlock(&entry->lock);
 	mutex_unlock(&entry->group->mark_mutex);
 out:
 	fsnotify_put_mark(entry);
@@ -360,12 +355,12 @@ static int create_chunk(struct inode *inode, struct audit_tree *tree)
 		return -ENOSPC;
 	}
 
-	spin_lock(&entry->lock);
+	mutex_lock(&entry->group->mark_mutex);
 	spin_lock(&hash_lock);
 	if (tree->goner) {
 		spin_unlock(&hash_lock);
 		chunk->dead = 1;
-		spin_unlock(&entry->lock);
+		mutex_unlock(&entry->group->mark_mutex);
 		fsnotify_destroy_mark(entry, audit_tree_group);
 		fsnotify_put_mark(entry);
 		return 0;
@@ -380,7 +375,7 @@ static int create_chunk(struct inode *inode, struct audit_tree *tree)
 	}
 	insert_hash(chunk);
 	spin_unlock(&hash_lock);
-	spin_unlock(&entry->lock);
+	mutex_unlock(&entry->group->mark_mutex);
 	fsnotify_put_mark(entry);	/* drop initial reference */
 	return 0;
 }
@@ -421,14 +416,12 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 	chunk_entry = &chunk->mark;
 
 	mutex_lock(&old_entry->group->mark_mutex);
-	spin_lock(&old_entry->lock);
 	/*
 	 * mark_mutex protects mark from getting detached and thus also from
 	 * mark->connector->obj getting NULL.
 	 */
 	if (!(old_entry->flags & FSNOTIFY_MARK_FLAG_ATTACHED)) {
 		/* old_entry is being shot, lets just lie */
-		spin_unlock(&old_entry->lock);
 		mutex_unlock(&old_entry->group->mark_mutex);
 		fsnotify_put_mark(old_entry);
 		fsnotify_put_mark(&chunk->mark);
@@ -437,23 +430,16 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 
 	if (fsnotify_add_mark_locked(chunk_entry, old_entry->connector->obj,
 				     FSNOTIFY_OBJ_TYPE_INODE, 1)) {
-		spin_unlock(&old_entry->lock);
 		mutex_unlock(&old_entry->group->mark_mutex);
 		fsnotify_put_mark(chunk_entry);
 		fsnotify_put_mark(old_entry);
 		return -ENOSPC;
 	}
 
-	/* even though we hold old_entry->lock, this is safe since chunk_entry->lock could NEVER have been grabbed before */
-	spin_lock(&chunk_entry->lock);
 	spin_lock(&hash_lock);
-
-	/* we now hold old_entry->lock, chunk_entry->lock, and hash_lock */
 	if (tree->goner) {
 		spin_unlock(&hash_lock);
 		chunk->dead = 1;
-		spin_unlock(&chunk_entry->lock);
-		spin_unlock(&old_entry->lock);
 		mutex_unlock(&old_entry->group->mark_mutex);
 
 		fsnotify_destroy_mark(chunk_entry, audit_tree_group);
@@ -485,8 +471,6 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 		list_add(&tree->same_root, &chunk->trees);
 	}
 	spin_unlock(&hash_lock);
-	spin_unlock(&chunk_entry->lock);
-	spin_unlock(&old_entry->lock);
 	mutex_unlock(&old_entry->group->mark_mutex);
 	fsnotify_destroy_mark(old_entry, audit_tree_group);
 	fsnotify_put_mark(chunk_entry);	/* drop initial reference */
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 02/10] audit: Fix possible spurious -ENOSPC error
  2018-07-10 10:02 [PATCH 0/10 v2] audit: Fix various races when tagging and untagging mounts Jan Kara
  2018-07-10 10:02 ` [PATCH 01/10] audit_tree: Remove mark->lock locking Jan Kara
@ 2018-07-10 10:02 ` Jan Kara
  2018-07-27  4:47   ` Paul Moore
  2018-07-10 10:02 ` [PATCH 03/10] audit: Fix possible tagging failures Jan Kara
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Jan Kara @ 2018-07-10 10:02 UTC (permalink / raw)
  To: linux-audit
  Cc: Paul Moore, linux-fsdevel, Al Viro, Richard Guy Briggs,
	Amir Goldstein, Jan Kara

When an inode is tagged with a tree, tag_chunk() checks whether there is
audit_tree_group mark attached to the inode and adds one if not. However
nothing protects another tag_chunk() to add the mark between we've
checked and try to add the fsnotify mark thus resulting in an error from
fsnotify_add_mark() and consequently an ENOSPC error from tag_chunk().

Fix the problem by holding mark_mutex over the whole check-insert code
sequence.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 kernel/audit_tree.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index 1c82eb6674c4..de8d344d91b1 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -342,25 +342,29 @@ static void untag_chunk(struct node *p)
 	spin_lock(&hash_lock);
 }
 
+/* Call with group->mark_mutex held, releases it */
 static int create_chunk(struct inode *inode, struct audit_tree *tree)
 {
 	struct fsnotify_mark *entry;
 	struct audit_chunk *chunk = alloc_chunk(1);
-	if (!chunk)
+
+	if (!chunk) {
+		mutex_unlock(&audit_tree_group->mark_mutex);
 		return -ENOMEM;
+	}
 
 	entry = &chunk->mark;
-	if (fsnotify_add_inode_mark(entry, inode, 0)) {
+	if (fsnotify_add_inode_mark_locked(entry, inode, 0)) {
+		mutex_unlock(&audit_tree_group->mark_mutex);
 		fsnotify_put_mark(entry);
 		return -ENOSPC;
 	}
 
-	mutex_lock(&entry->group->mark_mutex);
 	spin_lock(&hash_lock);
 	if (tree->goner) {
 		spin_unlock(&hash_lock);
 		chunk->dead = 1;
-		mutex_unlock(&entry->group->mark_mutex);
+		mutex_unlock(&audit_tree_group->mark_mutex);
 		fsnotify_destroy_mark(entry, audit_tree_group);
 		fsnotify_put_mark(entry);
 		return 0;
@@ -375,7 +379,7 @@ static int create_chunk(struct inode *inode, struct audit_tree *tree)
 	}
 	insert_hash(chunk);
 	spin_unlock(&hash_lock);
-	mutex_unlock(&entry->group->mark_mutex);
+	mutex_unlock(&audit_tree_group->mark_mutex);
 	fsnotify_put_mark(entry);	/* drop initial reference */
 	return 0;
 }
@@ -389,6 +393,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 	struct node *p;
 	int n;
 
+	mutex_lock(&audit_tree_group->mark_mutex);
 	old_entry = fsnotify_find_mark(&inode->i_fsnotify_marks,
 				       audit_tree_group);
 	if (!old_entry)
@@ -401,6 +406,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 	for (n = 0; n < old->count; n++) {
 		if (old->owners[n].owner == tree) {
 			spin_unlock(&hash_lock);
+			mutex_unlock(&audit_tree_group->mark_mutex);
 			fsnotify_put_mark(old_entry);
 			return 0;
 		}
@@ -409,20 +415,20 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 
 	chunk = alloc_chunk(old->count + 1);
 	if (!chunk) {
+		mutex_unlock(&audit_tree_group->mark_mutex);
 		fsnotify_put_mark(old_entry);
 		return -ENOMEM;
 	}
 
 	chunk_entry = &chunk->mark;
 
-	mutex_lock(&old_entry->group->mark_mutex);
 	/*
 	 * mark_mutex protects mark from getting detached and thus also from
 	 * mark->connector->obj getting NULL.
 	 */
 	if (!(old_entry->flags & FSNOTIFY_MARK_FLAG_ATTACHED)) {
 		/* old_entry is being shot, lets just lie */
-		mutex_unlock(&old_entry->group->mark_mutex);
+		mutex_unlock(&audit_tree_group->mark_mutex);
 		fsnotify_put_mark(old_entry);
 		fsnotify_put_mark(&chunk->mark);
 		return -ENOENT;
@@ -430,7 +436,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 
 	if (fsnotify_add_mark_locked(chunk_entry, old_entry->connector->obj,
 				     FSNOTIFY_OBJ_TYPE_INODE, 1)) {
-		mutex_unlock(&old_entry->group->mark_mutex);
+		mutex_unlock(&audit_tree_group->mark_mutex);
 		fsnotify_put_mark(chunk_entry);
 		fsnotify_put_mark(old_entry);
 		return -ENOSPC;
@@ -440,7 +446,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 	if (tree->goner) {
 		spin_unlock(&hash_lock);
 		chunk->dead = 1;
-		mutex_unlock(&old_entry->group->mark_mutex);
+		mutex_unlock(&audit_tree_group->mark_mutex);
 
 		fsnotify_destroy_mark(chunk_entry, audit_tree_group);
 
@@ -471,7 +477,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 		list_add(&tree->same_root, &chunk->trees);
 	}
 	spin_unlock(&hash_lock);
-	mutex_unlock(&old_entry->group->mark_mutex);
+	mutex_unlock(&audit_tree_group->mark_mutex);
 	fsnotify_destroy_mark(old_entry, audit_tree_group);
 	fsnotify_put_mark(chunk_entry);	/* drop initial reference */
 	fsnotify_put_mark(old_entry); /* pair to fsnotify_find mark_entry */
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 03/10] audit: Fix possible tagging failures
  2018-07-10 10:02 [PATCH 0/10 v2] audit: Fix various races when tagging and untagging mounts Jan Kara
  2018-07-10 10:02 ` [PATCH 01/10] audit_tree: Remove mark->lock locking Jan Kara
  2018-07-10 10:02 ` [PATCH 02/10] audit: Fix possible spurious -ENOSPC error Jan Kara
@ 2018-07-10 10:02 ` Jan Kara
  2018-07-10 10:02 ` [PATCH 04/10] audit: Embed key into chunk Jan Kara
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Jan Kara @ 2018-07-10 10:02 UTC (permalink / raw)
  To: linux-audit
  Cc: Paul Moore, linux-fsdevel, Al Viro, Richard Guy Briggs,
	Amir Goldstein, Jan Kara

Audit tree code is replacing marks attached to inodes in non-atomic way.
Thus fsnotify_find_mark() in tag_chunk() may find a mark that belongs to
a chunk that is no longer valid one and will soon be destroyed. Tags
added to such chunk will be simply lost.

Fix the problem by making sure old mark is marked as going away (through
fsnotify_detach_mark()) before dropping mark_mutex and thus in an atomic
way wrt tag_chunk(). Note that this does not fix the problem completely
as if tag_chunk() finds a mark that is going away, it fails with
-ENOENT. But at least the failure is not silent and currently there's no
way to search for another fsnotify mark attached to the inode. We'll fix
this problem in later patch.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 kernel/audit_tree.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index de8d344d91b1..7f9df8c66227 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -278,8 +278,9 @@ static void untag_chunk(struct node *p)
 		list_del_init(&p->list);
 		list_del_rcu(&chunk->hash);
 		spin_unlock(&hash_lock);
+		fsnotify_detach_mark(entry);
 		mutex_unlock(&entry->group->mark_mutex);
-		fsnotify_destroy_mark(entry, audit_tree_group);
+		fsnotify_free_mark(entry);
 		goto out;
 	}
 
@@ -320,8 +321,9 @@ static void untag_chunk(struct node *p)
 	list_for_each_entry(owner, &new->trees, same_root)
 		owner->root = new;
 	spin_unlock(&hash_lock);
+	fsnotify_detach_mark(entry);
 	mutex_unlock(&entry->group->mark_mutex);
-	fsnotify_destroy_mark(entry, audit_tree_group);
+	fsnotify_free_mark(entry);
 	fsnotify_put_mark(&new->mark);	/* drop initial reference */
 	goto out;
 
@@ -364,8 +366,9 @@ static int create_chunk(struct inode *inode, struct audit_tree *tree)
 	if (tree->goner) {
 		spin_unlock(&hash_lock);
 		chunk->dead = 1;
+		fsnotify_detach_mark(entry);
 		mutex_unlock(&audit_tree_group->mark_mutex);
-		fsnotify_destroy_mark(entry, audit_tree_group);
+		fsnotify_free_mark(entry);
 		fsnotify_put_mark(entry);
 		return 0;
 	}
@@ -446,10 +449,9 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 	if (tree->goner) {
 		spin_unlock(&hash_lock);
 		chunk->dead = 1;
+		fsnotify_detach_mark(chunk_entry);
 		mutex_unlock(&audit_tree_group->mark_mutex);
-
-		fsnotify_destroy_mark(chunk_entry, audit_tree_group);
-
+		fsnotify_free_mark(chunk_entry);
 		fsnotify_put_mark(chunk_entry);
 		fsnotify_put_mark(old_entry);
 		return 0;
@@ -477,8 +479,9 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 		list_add(&tree->same_root, &chunk->trees);
 	}
 	spin_unlock(&hash_lock);
+	fsnotify_detach_mark(old_entry);
 	mutex_unlock(&audit_tree_group->mark_mutex);
-	fsnotify_destroy_mark(old_entry, audit_tree_group);
+	fsnotify_free_mark(old_entry);
 	fsnotify_put_mark(chunk_entry);	/* drop initial reference */
 	fsnotify_put_mark(old_entry); /* pair to fsnotify_find mark_entry */
 	return 0;
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 04/10] audit: Embed key into chunk
  2018-07-10 10:02 [PATCH 0/10 v2] audit: Fix various races when tagging and untagging mounts Jan Kara
                   ` (2 preceding siblings ...)
  2018-07-10 10:02 ` [PATCH 03/10] audit: Fix possible tagging failures Jan Kara
@ 2018-07-10 10:02 ` Jan Kara
  2018-07-27  4:47   ` Paul Moore
  2018-07-10 10:02 ` [PATCH 05/10] audit: Make hash table insertion safe against concurrent lookups Jan Kara
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Jan Kara @ 2018-07-10 10:02 UTC (permalink / raw)
  To: linux-audit
  Cc: Paul Moore, linux-fsdevel, Al Viro, Richard Guy Briggs,
	Amir Goldstein, Jan Kara

Currently chunk hash key (which is in fact pointer to the inode) is
derived as chunk->mark.conn->obj. It is tricky to make this dereference
reliable for hash table lookups only under RCU as mark can get detached
from the connector and connector gets freed independently of the
running lookup. Thus there is a possible use after free / NULL ptr
dereference issue:

CPU1					CPU2
					untag_chunk()
					  ...
audit_tree_lookup()
  list_for_each_entry_rcu(p, list, hash) {
					  list_del_rcu(&chunk->hash);
					  fsnotify_destroy_mark(entry);
					  fsnotify_put_mark(entry)
    chunk_to_key(p)
      if (!chunk->mark.connector)
					    ...
					    hlist_del_init_rcu(&mark->obj_list);
					    if (hlist_empty(&conn->list)) {
					      inode = fsnotify_detach_connector_from_object(conn);
					    mark->connector = NULL;
					    ...
					    frees connector from workqueue
      chunk->mark.connector->obj

This race is probably impossible to hit in practice as the race window
on CPU1 is very narrow and CPU2 has a lot of code to execute. Still it's
better to have this fixed. Since the inode the chunk is attached to is
constant during chunk's lifetime it is easy to cache the key in the
chunk itself and thus avoid these issues.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 kernel/audit_tree.c | 27 ++++++++-------------------
 1 file changed, 8 insertions(+), 19 deletions(-)

diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index 7f9df8c66227..a210963bae32 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -24,6 +24,7 @@ struct audit_tree {
 
 struct audit_chunk {
 	struct list_head hash;
+	unsigned long key;
 	struct fsnotify_mark mark;
 	struct list_head trees;		/* with root here */
 	int dead;
@@ -172,21 +173,6 @@ static unsigned long inode_to_key(const struct inode *inode)
 	return (unsigned long)&inode->i_fsnotify_marks;
 }
 
-/*
- * Function to return search key in our hash from chunk. Key 0 is special and
- * should never be present in the hash.
- */
-static unsigned long chunk_to_key(struct audit_chunk *chunk)
-{
-	/*
-	 * We have a reference to the mark so it should be attached to a
-	 * connector.
-	 */
-	if (WARN_ON_ONCE(!chunk->mark.connector))
-		return 0;
-	return (unsigned long)chunk->mark.connector->obj;
-}
-
 static inline struct list_head *chunk_hash(unsigned long key)
 {
 	unsigned long n = key / L1_CACHE_BYTES;
@@ -196,12 +182,12 @@ static inline struct list_head *chunk_hash(unsigned long key)
 /* hash_lock & entry->group->mark_mutex is held by caller */
 static void insert_hash(struct audit_chunk *chunk)
 {
-	unsigned long key = chunk_to_key(chunk);
 	struct list_head *list;
 
 	if (!(chunk->mark.flags & FSNOTIFY_MARK_FLAG_ATTACHED))
 		return;
-	list = chunk_hash(key);
+	WARN_ON_ONCE(!chunk->key);
+	list = chunk_hash(chunk->key);
 	list_add_rcu(&chunk->hash, list);
 }
 
@@ -213,7 +199,7 @@ struct audit_chunk *audit_tree_lookup(const struct inode *inode)
 	struct audit_chunk *p;
 
 	list_for_each_entry_rcu(p, list, hash) {
-		if (chunk_to_key(p) == key) {
+		if (p->key == key) {
 			atomic_long_inc(&p->refs);
 			return p;
 		}
@@ -295,6 +281,7 @@ static void untag_chunk(struct node *p)
 
 	chunk->dead = 1;
 	spin_lock(&hash_lock);
+	new->key = chunk->key;
 	list_replace_init(&chunk->trees, &new->trees);
 	if (owner->root == chunk) {
 		list_del_init(&owner->same_root);
@@ -380,6 +367,7 @@ static int create_chunk(struct inode *inode, struct audit_tree *tree)
 		tree->root = chunk;
 		list_add(&tree->same_root, &chunk->trees);
 	}
+	chunk->key = inode_to_key(inode);
 	insert_hash(chunk);
 	spin_unlock(&hash_lock);
 	mutex_unlock(&audit_tree_group->mark_mutex);
@@ -456,6 +444,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 		fsnotify_put_mark(old_entry);
 		return 0;
 	}
+	chunk->key = old->key;
 	list_replace_init(&old->trees, &chunk->trees);
 	for (n = 0, p = chunk->owners; n < old->count; n++, p++) {
 		struct audit_tree *s = old->owners[n].owner;
@@ -652,7 +641,7 @@ void audit_trim_trees(void)
 			/* this could be NULL if the watch is dying else where... */
 			node->index |= 1U<<31;
 			if (iterate_mounts(compare_root,
-					   (void *)chunk_to_key(chunk),
+					   (void *)(chunk->key),
 					   root_mnt))
 				node->index &= ~(1U<<31);
 		}
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 05/10] audit: Make hash table insertion safe against concurrent lookups
  2018-07-10 10:02 [PATCH 0/10 v2] audit: Fix various races when tagging and untagging mounts Jan Kara
                   ` (3 preceding siblings ...)
  2018-07-10 10:02 ` [PATCH 04/10] audit: Embed key into chunk Jan Kara
@ 2018-07-10 10:02 ` Jan Kara
  2018-07-10 10:02 ` [PATCH 06/10] audit: Factor out chunk replacement code Jan Kara
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Jan Kara @ 2018-07-10 10:02 UTC (permalink / raw)
  To: linux-audit
  Cc: Paul Moore, linux-fsdevel, Al Viro, Richard Guy Briggs,
	Amir Goldstein, Jan Kara

Currently, the audit tree code does not make sure that when a chunk in
inserted into the hash table, it is fully initialized. So in theory (in
practice only on DEC Alpha) a user of RCU lookup could see uninitialized
structure in the hash table and crash. Add appropriate barriers between
initialization of the structure and its insertion into hash table.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 kernel/audit_tree.c | 32 +++++++++++++++++++++++++++++---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index a210963bae32..6a01738d1ac2 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -186,6 +186,12 @@ static void insert_hash(struct audit_chunk *chunk)
 
 	if (!(chunk->mark.flags & FSNOTIFY_MARK_FLAG_ATTACHED))
 		return;
+	/*
+	 * Make sure chunk is fully initialized before making it visible in the
+	 * hash. Pairs with a data dependency barrier in READ_ONCE() in
+	 * audit_tree_lookup().
+	 */
+	smp_wmb();
 	WARN_ON_ONCE(!chunk->key);
 	list = chunk_hash(chunk->key);
 	list_add_rcu(&chunk->hash, list);
@@ -199,7 +205,11 @@ struct audit_chunk *audit_tree_lookup(const struct inode *inode)
 	struct audit_chunk *p;
 
 	list_for_each_entry_rcu(p, list, hash) {
-		if (p->key == key) {
+		/*
+		 * We use a data dependency barrier in READ_ONCE() to make sure
+		 * the chunk we see is fully initialized.
+		 */
+		if (READ_ONCE(p->key) == key) {
 			atomic_long_inc(&p->refs);
 			return p;
 		}
@@ -304,9 +314,15 @@ static void untag_chunk(struct node *p)
 		list_replace_init(&chunk->owners[j].list, &new->owners[i].list);
 	}
 
-	list_replace_rcu(&chunk->hash, &new->hash);
 	list_for_each_entry(owner, &new->trees, same_root)
 		owner->root = new;
+	/*
+	 * Make sure chunk is fully initialized before making it visible in the
+	 * hash. Pairs with a data dependency barrier in READ_ONCE() in
+	 * audit_tree_lookup().
+	 */
+	smp_wmb();
+	list_replace_rcu(&chunk->hash, &new->hash);
 	spin_unlock(&hash_lock);
 	fsnotify_detach_mark(entry);
 	mutex_unlock(&entry->group->mark_mutex);
@@ -368,6 +384,10 @@ static int create_chunk(struct inode *inode, struct audit_tree *tree)
 		list_add(&tree->same_root, &chunk->trees);
 	}
 	chunk->key = inode_to_key(inode);
+	/*
+	 * Inserting into the hash table has to go last as once we do that RCU
+ 	 * readers can see the chunk.
+	 */
 	insert_hash(chunk);
 	spin_unlock(&hash_lock);
 	mutex_unlock(&audit_tree_group->mark_mutex);
@@ -459,7 +479,6 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 	p->owner = tree;
 	get_tree(tree);
 	list_add(&p->list, &tree->chunks);
-	list_replace_rcu(&old->hash, &chunk->hash);
 	list_for_each_entry(owner, &chunk->trees, same_root)
 		owner->root = chunk;
 	old->dead = 1;
@@ -467,6 +486,13 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 		tree->root = chunk;
 		list_add(&tree->same_root, &chunk->trees);
 	}
+	/*
+	 * Make sure chunk is fully initialized before making it visible in the
+	 * hash. Pairs with a data dependency barrier in READ_ONCE() in
+	 * audit_tree_lookup().
+	 */
+	smp_wmb();
+	list_replace_rcu(&old->hash, &chunk->hash);
 	spin_unlock(&hash_lock);
 	fsnotify_detach_mark(old_entry);
 	mutex_unlock(&audit_tree_group->mark_mutex);
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 06/10] audit: Factor out chunk replacement code
  2018-07-10 10:02 [PATCH 0/10 v2] audit: Fix various races when tagging and untagging mounts Jan Kara
                   ` (4 preceding siblings ...)
  2018-07-10 10:02 ` [PATCH 05/10] audit: Make hash table insertion safe against concurrent lookups Jan Kara
@ 2018-07-10 10:02 ` Jan Kara
  2018-07-11  7:58   ` Amir Goldstein
  2018-07-27  4:47   ` Paul Moore
  2018-07-10 10:02 ` [PATCH 07/10] audit: Remove pointless check in insert_hash() Jan Kara
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 32+ messages in thread
From: Jan Kara @ 2018-07-10 10:02 UTC (permalink / raw)
  To: linux-audit
  Cc: Paul Moore, linux-fsdevel, Al Viro, Richard Guy Briggs,
	Amir Goldstein, Jan Kara

Chunk replacement code is very similar for the cases where we grow or
shrink chunk. Factor the code out into a common helper function.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 kernel/audit_tree.c | 86 +++++++++++++++++++++++++----------------------------
 1 file changed, 40 insertions(+), 46 deletions(-)

diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index 6a01738d1ac2..f419fdfc25b4 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -235,6 +235,38 @@ static struct audit_chunk *find_chunk(struct node *p)
 	return container_of(p, struct audit_chunk, owners[0]);
 }
 
+static void replace_chunk(struct audit_chunk *new, struct audit_chunk *old,
+			  struct node *skip)
+{
+	struct audit_tree *owner;
+	int i, j;
+
+	new->key = old->key;
+	list_replace_init(&old->trees, &new->trees);
+	list_for_each_entry(owner, &new->trees, same_root)
+		owner->root = new;
+	for (i = j = 0; j < old->count; i++, j++) {
+		if (&old->owners[j] == skip) {
+			i--;
+			continue;
+		}
+		owner = old->owners[j].owner;
+		new->owners[i].owner = owner;
+		new->owners[i].index = old->owners[j].index - j + i;
+		if (!owner) /* result of earlier fallback */
+			continue;
+		get_tree(owner);
+		list_replace_init(&old->owners[j].list, &new->owners[i].list);
+	}
+	/*
+	 * Make sure chunk is fully initialized before making it visible in the
+	 * hash. Pairs with a data dependency barrier in READ_ONCE() in
+	 * audit_tree_lookup().
+	 */
+	smp_wmb();
+	list_replace_rcu(&old->hash, &new->hash);
+}
+
 static void untag_chunk(struct node *p)
 {
 	struct audit_chunk *chunk = find_chunk(p);
@@ -242,7 +274,6 @@ static void untag_chunk(struct node *p)
 	struct audit_chunk *new = NULL;
 	struct audit_tree *owner;
 	int size = chunk->count - 1;
-	int i, j;
 
 	fsnotify_get_mark(entry);
 
@@ -291,38 +322,16 @@ static void untag_chunk(struct node *p)
 
 	chunk->dead = 1;
 	spin_lock(&hash_lock);
-	new->key = chunk->key;
-	list_replace_init(&chunk->trees, &new->trees);
 	if (owner->root == chunk) {
 		list_del_init(&owner->same_root);
 		owner->root = NULL;
 	}
-
-	for (i = j = 0; j <= size; i++, j++) {
-		struct audit_tree *s;
-		if (&chunk->owners[j] == p) {
-			list_del_init(&p->list);
-			i--;
-			continue;
-		}
-		s = chunk->owners[j].owner;
-		new->owners[i].owner = s;
-		new->owners[i].index = chunk->owners[j].index - j + i;
-		if (!s) /* result of earlier fallback */
-			continue;
-		get_tree(s);
-		list_replace_init(&chunk->owners[j].list, &new->owners[i].list);
-	}
-
-	list_for_each_entry(owner, &new->trees, same_root)
-		owner->root = new;
+	list_del_init(&p->list);
 	/*
-	 * Make sure chunk is fully initialized before making it visible in the
-	 * hash. Pairs with a data dependency barrier in READ_ONCE() in
-	 * audit_tree_lookup().
+	 * This has to go last when updating chunk as once replace_chunk() is
+	 * called, new RCU readers can see the new chunk.
 	 */
-	smp_wmb();
-	list_replace_rcu(&chunk->hash, &new->hash);
+	replace_chunk(new, chunk, p);
 	spin_unlock(&hash_lock);
 	fsnotify_detach_mark(entry);
 	mutex_unlock(&entry->group->mark_mutex);
@@ -399,7 +408,6 @@ static int create_chunk(struct inode *inode, struct audit_tree *tree)
 static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 {
 	struct fsnotify_mark *old_entry, *chunk_entry;
-	struct audit_tree *owner;
 	struct audit_chunk *chunk, *old;
 	struct node *p;
 	int n;
@@ -464,35 +472,21 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 		fsnotify_put_mark(old_entry);
 		return 0;
 	}
-	chunk->key = old->key;
-	list_replace_init(&old->trees, &chunk->trees);
-	for (n = 0, p = chunk->owners; n < old->count; n++, p++) {
-		struct audit_tree *s = old->owners[n].owner;
-		p->owner = s;
-		p->index = old->owners[n].index;
-		if (!s) /* result of fallback in untag */
-			continue;
-		get_tree(s);
-		list_replace_init(&old->owners[n].list, &p->list);
-	}
+	p = &chunk->owners[chunk->count - 1];
 	p->index = (chunk->count - 1) | (1U<<31);
 	p->owner = tree;
 	get_tree(tree);
 	list_add(&p->list, &tree->chunks);
-	list_for_each_entry(owner, &chunk->trees, same_root)
-		owner->root = chunk;
 	old->dead = 1;
 	if (!tree->root) {
 		tree->root = chunk;
 		list_add(&tree->same_root, &chunk->trees);
 	}
 	/*
-	 * Make sure chunk is fully initialized before making it visible in the
-	 * hash. Pairs with a data dependency barrier in READ_ONCE() in
-	 * audit_tree_lookup().
+	 * This has to go last when updating chunk as once replace_chunk() is
+	 * called, new RCU readers can see the new chunk.
 	 */
-	smp_wmb();
-	list_replace_rcu(&old->hash, &chunk->hash);
+	replace_chunk(chunk, old, NULL);
 	spin_unlock(&hash_lock);
 	fsnotify_detach_mark(old_entry);
 	mutex_unlock(&audit_tree_group->mark_mutex);
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 07/10] audit: Remove pointless check in insert_hash()
  2018-07-10 10:02 [PATCH 0/10 v2] audit: Fix various races when tagging and untagging mounts Jan Kara
                   ` (5 preceding siblings ...)
  2018-07-10 10:02 ` [PATCH 06/10] audit: Factor out chunk replacement code Jan Kara
@ 2018-07-10 10:02 ` Jan Kara
  2018-07-27  4:47   ` Paul Moore
  2018-07-10 10:02 ` [PATCH 08/10] audit: Provide helper for dropping mark's chunk reference Jan Kara
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Jan Kara @ 2018-07-10 10:02 UTC (permalink / raw)
  To: linux-audit
  Cc: Paul Moore, linux-fsdevel, Al Viro, Richard Guy Briggs,
	Amir Goldstein, Jan Kara

The audit_tree_group->mark_mutex is held all the time while we create
the fsnotify mark, add it to the inode, and insert chunk into the hash.
Hence mark cannot get detached during this time and so the check whether
the mark is attached in insert_hash() is pointless.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 kernel/audit_tree.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index f419fdfc25b4..b7977cf30479 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -184,8 +184,6 @@ static void insert_hash(struct audit_chunk *chunk)
 {
 	struct list_head *list;
 
-	if (!(chunk->mark.flags & FSNOTIFY_MARK_FLAG_ATTACHED))
-		return;
 	/*
 	 * Make sure chunk is fully initialized before making it visible in the
 	 * hash. Pairs with a data dependency barrier in READ_ONCE() in
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 08/10] audit: Provide helper for dropping mark's chunk reference
  2018-07-10 10:02 [PATCH 0/10 v2] audit: Fix various races when tagging and untagging mounts Jan Kara
                   ` (6 preceding siblings ...)
  2018-07-10 10:02 ` [PATCH 07/10] audit: Remove pointless check in insert_hash() Jan Kara
@ 2018-07-10 10:02 ` Jan Kara
  2018-07-10 10:02 ` [PATCH 09/10] audit: Allocate fsnotify mark independently of chunk Jan Kara
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Jan Kara @ 2018-07-10 10:02 UTC (permalink / raw)
  To: linux-audit
  Cc: Paul Moore, linux-fsdevel, Al Viro, Richard Guy Briggs,
	Amir Goldstein, Jan Kara

Provide a helper function audit_mark_put_chunk() for dropping mark's
reference (which has to happen only after RCU grace period expires).
Currently that happens only from a single place but in later patches we
introduce more callers.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 kernel/audit_tree.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index b7977cf30479..bce3b04a365d 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -132,10 +132,20 @@ static void __put_chunk(struct rcu_head *rcu)
 	audit_put_chunk(chunk);
 }
 
+/*
+ * Drop reference to the chunk that was held by the mark. This is the reference
+ * that gets dropped after we've removed the chunk from the hash table and we
+ * use it to make sure chunk cannot be freed before RCU grace period expires.
+ */
+static void audit_mark_put_chunk(struct audit_chunk *chunk)
+{
+	call_rcu(&chunk->head, __put_chunk);
+}
+
 static void audit_tree_destroy_watch(struct fsnotify_mark *entry)
 {
 	struct audit_chunk *chunk = container_of(entry, struct audit_chunk, mark);
-	call_rcu(&chunk->head, __put_chunk);
+	audit_mark_put_chunk(chunk);
 }
 
 static struct audit_chunk *alloc_chunk(int count)
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 09/10] audit: Allocate fsnotify mark independently of chunk
  2018-07-10 10:02 [PATCH 0/10 v2] audit: Fix various races when tagging and untagging mounts Jan Kara
                   ` (7 preceding siblings ...)
  2018-07-10 10:02 ` [PATCH 08/10] audit: Provide helper for dropping mark's chunk reference Jan Kara
@ 2018-07-10 10:02 ` Jan Kara
  2018-07-11  8:57   ` Amir Goldstein
  2018-07-27  4:47   ` Paul Moore
  2018-07-10 10:02 ` [PATCH 10/10] audit: Replace chunk attached to mark instead of replacing mark Jan Kara
  2018-07-10 10:02 ` [PATCH 11/10 TESTSUITE] audit_testsuite: Add stress test for tree watches Jan Kara
  10 siblings, 2 replies; 32+ messages in thread
From: Jan Kara @ 2018-07-10 10:02 UTC (permalink / raw)
  To: linux-audit
  Cc: Paul Moore, linux-fsdevel, Al Viro, Richard Guy Briggs,
	Amir Goldstein, Jan Kara

Allocate fsnotify mark independently instead of embedding it inside
chunk. This will allow us to just replace chunk attached to mark when
growing / shrinking chunk instead of replacing mark attached to inode
which is a more complex operation.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 kernel/audit_tree.c | 59 ++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 45 insertions(+), 14 deletions(-)

diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index bce3b04a365d..aec9b27a20ff 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -25,7 +25,7 @@ struct audit_tree {
 struct audit_chunk {
 	struct list_head hash;
 	unsigned long key;
-	struct fsnotify_mark mark;
+	struct fsnotify_mark *mark;
 	struct list_head trees;		/* with root here */
 	int dead;
 	int count;
@@ -38,6 +38,11 @@ struct audit_chunk {
 	} owners[];
 };
 
+struct audit_tree_mark {
+	struct fsnotify_mark fsn_mark;
+	struct audit_chunk *chunk;
+};
+
 static LIST_HEAD(tree_list);
 static LIST_HEAD(prune_list);
 static struct task_struct *prune_thread;
@@ -73,6 +78,7 @@ static struct task_struct *prune_thread;
  */
 
 static struct fsnotify_group *audit_tree_group;
+static struct kmem_cache *audit_tree_mark_cachep __read_mostly;
 
 static struct audit_tree *alloc_tree(const char *s)
 {
@@ -142,10 +148,28 @@ static void audit_mark_put_chunk(struct audit_chunk *chunk)
 	call_rcu(&chunk->head, __put_chunk);
 }
 
+static inline struct audit_tree_mark *AUDIT_M(struct fsnotify_mark *entry)
+{
+	return container_of(entry, struct audit_tree_mark, fsn_mark);
+}
+
 static void audit_tree_destroy_watch(struct fsnotify_mark *entry)
 {
-	struct audit_chunk *chunk = container_of(entry, struct audit_chunk, mark);
+	struct audit_chunk *chunk = AUDIT_M(entry)->chunk;
 	audit_mark_put_chunk(chunk);
+	kmem_cache_free(audit_tree_mark_cachep, entry);
+}
+
+static struct fsnotify_mark *alloc_fsnotify_mark(void)
+{
+	struct audit_tree_mark *mark;
+
+	mark = kmem_cache_zalloc(audit_tree_mark_cachep, GFP_KERNEL);
+	if (!mark)
+		return NULL;
+	fsnotify_init_mark(&mark->fsn_mark, audit_tree_group);
+	mark->fsn_mark.mask = FS_IN_IGNORED;
+	return &mark->fsn_mark;
 }
 
 static struct audit_chunk *alloc_chunk(int count)
@@ -159,6 +183,13 @@ static struct audit_chunk *alloc_chunk(int count)
 	if (!chunk)
 		return NULL;
 
+	chunk->mark = alloc_fsnotify_mark();
+	if (!chunk->mark) {
+		kfree(chunk);
+		return NULL;
+	}
+	AUDIT_M(chunk->mark)->chunk = chunk;
+
 	INIT_LIST_HEAD(&chunk->hash);
 	INIT_LIST_HEAD(&chunk->trees);
 	chunk->count = count;
@@ -167,8 +198,6 @@ static struct audit_chunk *alloc_chunk(int count)
 		INIT_LIST_HEAD(&chunk->owners[i].list);
 		chunk->owners[i].index = i;
 	}
-	fsnotify_init_mark(&chunk->mark, audit_tree_group);
-	chunk->mark.mask = FS_IN_IGNORED;
 	return chunk;
 }
 
@@ -278,7 +307,7 @@ static void replace_chunk(struct audit_chunk *new, struct audit_chunk *old,
 static void untag_chunk(struct node *p)
 {
 	struct audit_chunk *chunk = find_chunk(p);
-	struct fsnotify_mark *entry = &chunk->mark;
+	struct fsnotify_mark *entry = chunk->mark;
 	struct audit_chunk *new = NULL;
 	struct audit_tree *owner;
 	int size = chunk->count - 1;
@@ -298,7 +327,7 @@ static void untag_chunk(struct node *p)
 	if (chunk->dead || !(entry->flags & FSNOTIFY_MARK_FLAG_ATTACHED)) {
 		mutex_unlock(&entry->group->mark_mutex);
 		if (new)
-			fsnotify_put_mark(&new->mark);
+			fsnotify_put_mark(new->mark);
 		goto out;
 	}
 
@@ -322,9 +351,9 @@ static void untag_chunk(struct node *p)
 	if (!new)
 		goto Fallback;
 
-	if (fsnotify_add_mark_locked(&new->mark, entry->connector->obj,
+	if (fsnotify_add_mark_locked(new->mark, entry->connector->obj,
 				     FSNOTIFY_OBJ_TYPE_INODE, 1)) {
-		fsnotify_put_mark(&new->mark);
+		fsnotify_put_mark(new->mark);
 		goto Fallback;
 	}
 
@@ -344,7 +373,7 @@ static void untag_chunk(struct node *p)
 	fsnotify_detach_mark(entry);
 	mutex_unlock(&entry->group->mark_mutex);
 	fsnotify_free_mark(entry);
-	fsnotify_put_mark(&new->mark);	/* drop initial reference */
+	fsnotify_put_mark(new->mark);	/* drop initial reference */
 	goto out;
 
 Fallback:
@@ -375,7 +404,7 @@ static int create_chunk(struct inode *inode, struct audit_tree *tree)
 		return -ENOMEM;
 	}
 
-	entry = &chunk->mark;
+	entry = chunk->mark;
 	if (fsnotify_add_inode_mark_locked(entry, inode, 0)) {
 		mutex_unlock(&audit_tree_group->mark_mutex);
 		fsnotify_put_mark(entry);
@@ -426,7 +455,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 	if (!old_entry)
 		return create_chunk(inode, tree);
 
-	old = container_of(old_entry, struct audit_chunk, mark);
+	old = AUDIT_M(old_entry)->chunk;
 
 	/* are we already there? */
 	spin_lock(&hash_lock);
@@ -447,7 +476,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 		return -ENOMEM;
 	}
 
-	chunk_entry = &chunk->mark;
+	chunk_entry = chunk->mark;
 
 	/*
 	 * mark_mutex protects mark from getting detached and thus also from
@@ -457,7 +486,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 		/* old_entry is being shot, lets just lie */
 		mutex_unlock(&audit_tree_group->mark_mutex);
 		fsnotify_put_mark(old_entry);
-		fsnotify_put_mark(&chunk->mark);
+		fsnotify_put_mark(chunk->mark);
 		return -ENOENT;
 	}
 
@@ -1009,7 +1038,7 @@ static int audit_tree_handle_event(struct fsnotify_group *group,
 
 static void audit_tree_freeing_mark(struct fsnotify_mark *entry, struct fsnotify_group *group)
 {
-	struct audit_chunk *chunk = container_of(entry, struct audit_chunk, mark);
+	struct audit_chunk *chunk = AUDIT_M(entry)->chunk;
 
 	evict_chunk(chunk);
 
@@ -1030,6 +1059,8 @@ static int __init audit_tree_init(void)
 {
 	int i;
 
+	audit_tree_mark_cachep = KMEM_CACHE(audit_tree_mark, SLAB_PANIC);
+
 	audit_tree_group = fsnotify_alloc_group(&audit_tree_ops);
 	if (IS_ERR(audit_tree_group))
 		audit_panic("cannot initialize fsnotify group for rectree watches");
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 10/10] audit: Replace chunk attached to mark instead of replacing mark
  2018-07-10 10:02 [PATCH 0/10 v2] audit: Fix various races when tagging and untagging mounts Jan Kara
                   ` (8 preceding siblings ...)
  2018-07-10 10:02 ` [PATCH 09/10] audit: Allocate fsnotify mark independently of chunk Jan Kara
@ 2018-07-10 10:02 ` Jan Kara
  2018-07-11 14:17   ` Amir Goldstein
  2018-07-27  4:47   ` Paul Moore
  2018-07-10 10:02 ` [PATCH 11/10 TESTSUITE] audit_testsuite: Add stress test for tree watches Jan Kara
  10 siblings, 2 replies; 32+ messages in thread
From: Jan Kara @ 2018-07-10 10:02 UTC (permalink / raw)
  To: linux-audit
  Cc: Paul Moore, linux-fsdevel, Al Viro, Richard Guy Briggs,
	Amir Goldstein, Jan Kara

Audit tree code currently associates new fsnotify mark with each new
chunk. As chunk attached to an inode is replaced when new tag is added /
removed, we also need to remove old fsnotify mark and add a new one on
such occasion.  This is cumbersome and makes locking rules somewhat
difficult to follow.

Fix these problems by allocating fsnotify mark independently of chunk
and keeping it all the time while there is some chunk attached to an
inode. Also add documentation about the locking rules so that things are
easier to follow.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 kernel/audit_tree.c | 163 +++++++++++++++++++++++++++-------------------------
 1 file changed, 85 insertions(+), 78 deletions(-)

diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index aec9b27a20ff..40f61de77dd0 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -27,7 +27,6 @@ struct audit_chunk {
 	unsigned long key;
 	struct fsnotify_mark *mark;
 	struct list_head trees;		/* with root here */
-	int dead;
 	int count;
 	atomic_long_t refs;
 	struct rcu_head head;
@@ -48,8 +47,15 @@ static LIST_HEAD(prune_list);
 static struct task_struct *prune_thread;
 
 /*
- * One struct chunk is attached to each inode of interest.
- * We replace struct chunk on tagging/untagging.
+ * One struct chunk is attached to each inode of interest through
+ * audit_tree_mark (fsnotify mark). We replace struct chunk on tagging /
+ * untagging, the mark is stable as long as there is chunk attached. The
+ * association between mark and chunk is protected by hash_lock and
+ * audit_tree_group->mark_mutex. Thus as long as we hold
+ * audit_tree_group->mark_mutex and check that the mark is alive by
+ * FSNOTIFY_MARK_FLAG_ATTACHED flag check, we are sure the mark points to
+ * the current chunk.
+ *
  * Rules have pointer to struct audit_tree.
  * Rules have struct list_head rlist forming a list of rules over
  * the same tree.
@@ -68,8 +74,12 @@ static struct task_struct *prune_thread;
  * tree is refcounted; one reference for "some rules on rules_list refer to
  * it", one for each chunk with pointer to it.
  *
- * chunk is refcounted by embedded fsnotify_mark + .refs (non-zero refcount
- * of watch contributes 1 to .refs).
+ * chunk is refcounted by embedded .refs. Mark associated with the chunk holds
+ * one chunk reference. This reference is dropped either when a mark is going
+ * to be freed (corresponding inode goes away) or when chunk attached to the
+ * mark gets replaced. This reference must be dropped using
+ * audit_mark_put_chunk() to make sure the reference is dropped only after RCU
+ * grace period as it protects RCU readers of the hash table.
  *
  * node.index allows to get from node.list to containing chunk.
  * MSB of that sucker is stolen to mark taggings that we might have to
@@ -155,8 +165,6 @@ static inline struct audit_tree_mark *AUDIT_M(struct fsnotify_mark *entry)
 
 static void audit_tree_destroy_watch(struct fsnotify_mark *entry)
 {
-	struct audit_chunk *chunk = AUDIT_M(entry)->chunk;
-	audit_mark_put_chunk(chunk);
 	kmem_cache_free(audit_tree_mark_cachep, entry);
 }
 
@@ -183,13 +191,6 @@ static struct audit_chunk *alloc_chunk(int count)
 	if (!chunk)
 		return NULL;
 
-	chunk->mark = alloc_fsnotify_mark();
-	if (!chunk->mark) {
-		kfree(chunk);
-		return NULL;
-	}
-	AUDIT_M(chunk->mark)->chunk = chunk;
-
 	INIT_LIST_HEAD(&chunk->hash);
 	INIT_LIST_HEAD(&chunk->trees);
 	chunk->count = count;
@@ -272,6 +273,20 @@ static struct audit_chunk *find_chunk(struct node *p)
 	return container_of(p, struct audit_chunk, owners[0]);
 }
 
+static void replace_mark_chunk(struct fsnotify_mark *entry,
+			       struct audit_chunk *chunk)
+{
+	struct audit_chunk *old;
+
+	assert_spin_locked(&hash_lock);
+	old = AUDIT_M(entry)->chunk;
+	AUDIT_M(entry)->chunk = chunk;
+	if (chunk)
+		chunk->mark = entry;
+	if (old)
+		old->mark = NULL;
+}
+
 static void replace_chunk(struct audit_chunk *new, struct audit_chunk *old,
 			  struct node *skip)
 {
@@ -295,6 +310,7 @@ static void replace_chunk(struct audit_chunk *new, struct audit_chunk *old,
 		get_tree(owner);
 		list_replace_init(&old->owners[j].list, &new->owners[i].list);
 	}
+	replace_mark_chunk(old->mark, new);
 	/*
 	 * Make sure chunk is fully initialized before making it visible in the
 	 * hash. Pairs with a data dependency barrier in READ_ONCE() in
@@ -312,6 +328,10 @@ static void untag_chunk(struct node *p)
 	struct audit_tree *owner;
 	int size = chunk->count - 1;
 
+	/* Racing with audit_tree_freeing_mark()? */
+	if (!entry)
+		return;
+
 	fsnotify_get_mark(entry);
 
 	spin_unlock(&hash_lock);
@@ -321,29 +341,31 @@ static void untag_chunk(struct node *p)
 
 	mutex_lock(&entry->group->mark_mutex);
 	/*
-	 * mark_mutex protects mark from getting detached and thus also from
-	 * mark->connector->obj getting NULL.
+	 * mark_mutex protects mark stabilizes chunk attached to the mark so we
+	 * can check whether it didn't change while we've dropped hash_lock.
 	 */
-	if (chunk->dead || !(entry->flags & FSNOTIFY_MARK_FLAG_ATTACHED)) {
+	if (!(entry->flags & FSNOTIFY_MARK_FLAG_ATTACHED) ||
+	    AUDIT_M(entry)->chunk != chunk) {
 		mutex_unlock(&entry->group->mark_mutex);
 		if (new)
-			fsnotify_put_mark(new->mark);
+			kfree(new);
 		goto out;
 	}
 
 	owner = p->owner;
 
 	if (!size) {
-		chunk->dead = 1;
 		spin_lock(&hash_lock);
 		list_del_init(&chunk->trees);
 		if (owner->root == chunk)
 			owner->root = NULL;
 		list_del_init(&p->list);
 		list_del_rcu(&chunk->hash);
+		replace_mark_chunk(entry, NULL);
 		spin_unlock(&hash_lock);
 		fsnotify_detach_mark(entry);
 		mutex_unlock(&entry->group->mark_mutex);
+		audit_mark_put_chunk(chunk);
 		fsnotify_free_mark(entry);
 		goto out;
 	}
@@ -351,13 +373,6 @@ static void untag_chunk(struct node *p)
 	if (!new)
 		goto Fallback;
 
-	if (fsnotify_add_mark_locked(new->mark, entry->connector->obj,
-				     FSNOTIFY_OBJ_TYPE_INODE, 1)) {
-		fsnotify_put_mark(new->mark);
-		goto Fallback;
-	}
-
-	chunk->dead = 1;
 	spin_lock(&hash_lock);
 	if (owner->root == chunk) {
 		list_del_init(&owner->same_root);
@@ -370,10 +385,8 @@ static void untag_chunk(struct node *p)
 	 */
 	replace_chunk(new, chunk, p);
 	spin_unlock(&hash_lock);
-	fsnotify_detach_mark(entry);
 	mutex_unlock(&entry->group->mark_mutex);
-	fsnotify_free_mark(entry);
-	fsnotify_put_mark(new->mark);	/* drop initial reference */
+	audit_mark_put_chunk(chunk);
 	goto out;
 
 Fallback:
@@ -404,23 +417,31 @@ static int create_chunk(struct inode *inode, struct audit_tree *tree)
 		return -ENOMEM;
 	}
 
-	entry = chunk->mark;
+	entry = alloc_fsnotify_mark();
+	if (!entry) {
+		mutex_unlock(&audit_tree_group->mark_mutex);
+		kfree(chunk);
+		return -ENOMEM;
+	}
+
 	if (fsnotify_add_inode_mark_locked(entry, inode, 0)) {
 		mutex_unlock(&audit_tree_group->mark_mutex);
 		fsnotify_put_mark(entry);
+		kfree(chunk);
 		return -ENOSPC;
 	}
 
 	spin_lock(&hash_lock);
 	if (tree->goner) {
 		spin_unlock(&hash_lock);
-		chunk->dead = 1;
 		fsnotify_detach_mark(entry);
 		mutex_unlock(&audit_tree_group->mark_mutex);
 		fsnotify_free_mark(entry);
 		fsnotify_put_mark(entry);
+		kfree(chunk);
 		return 0;
 	}
+	replace_mark_chunk(entry, chunk);
 	chunk->owners[0].index = (1U << 31);
 	chunk->owners[0].owner = tree;
 	get_tree(tree);
@@ -437,33 +458,41 @@ static int create_chunk(struct inode *inode, struct audit_tree *tree)
 	insert_hash(chunk);
 	spin_unlock(&hash_lock);
 	mutex_unlock(&audit_tree_group->mark_mutex);
-	fsnotify_put_mark(entry);	/* drop initial reference */
+	/*
+	 * Drop our initial reference. When mark we point to is getting freed,
+	 * we get notification through ->freeing_mark callback and cleanup
+	 * chunk pointing to this mark.
+	 */
+	fsnotify_put_mark(entry);
 	return 0;
 }
 
 /* the first tagged inode becomes root of tree */
 static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 {
-	struct fsnotify_mark *old_entry, *chunk_entry;
+	struct fsnotify_mark *entry;
 	struct audit_chunk *chunk, *old;
 	struct node *p;
 	int n;
 
 	mutex_lock(&audit_tree_group->mark_mutex);
-	old_entry = fsnotify_find_mark(&inode->i_fsnotify_marks,
-				       audit_tree_group);
-	if (!old_entry)
+	entry = fsnotify_find_mark(&inode->i_fsnotify_marks, audit_tree_group);
+	if (!entry)
 		return create_chunk(inode, tree);
 
-	old = AUDIT_M(old_entry)->chunk;
-
+	/*
+	 * Found mark is guaranteed to be attached and mark_mutex protects mark
+	 * from getting detached and thus it makes sure there is chunk attached
+	 * to the mark.
+	 */
 	/* are we already there? */
 	spin_lock(&hash_lock);
+	old = AUDIT_M(entry)->chunk;
 	for (n = 0; n < old->count; n++) {
 		if (old->owners[n].owner == tree) {
 			spin_unlock(&hash_lock);
 			mutex_unlock(&audit_tree_group->mark_mutex);
-			fsnotify_put_mark(old_entry);
+			fsnotify_put_mark(entry);
 			return 0;
 		}
 	}
@@ -472,41 +501,16 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 	chunk = alloc_chunk(old->count + 1);
 	if (!chunk) {
 		mutex_unlock(&audit_tree_group->mark_mutex);
-		fsnotify_put_mark(old_entry);
+		fsnotify_put_mark(entry);
 		return -ENOMEM;
 	}
 
-	chunk_entry = chunk->mark;
-
-	/*
-	 * mark_mutex protects mark from getting detached and thus also from
-	 * mark->connector->obj getting NULL.
-	 */
-	if (!(old_entry->flags & FSNOTIFY_MARK_FLAG_ATTACHED)) {
-		/* old_entry is being shot, lets just lie */
-		mutex_unlock(&audit_tree_group->mark_mutex);
-		fsnotify_put_mark(old_entry);
-		fsnotify_put_mark(chunk->mark);
-		return -ENOENT;
-	}
-
-	if (fsnotify_add_mark_locked(chunk_entry, old_entry->connector->obj,
-				     FSNOTIFY_OBJ_TYPE_INODE, 1)) {
-		mutex_unlock(&audit_tree_group->mark_mutex);
-		fsnotify_put_mark(chunk_entry);
-		fsnotify_put_mark(old_entry);
-		return -ENOSPC;
-	}
-
 	spin_lock(&hash_lock);
 	if (tree->goner) {
 		spin_unlock(&hash_lock);
-		chunk->dead = 1;
-		fsnotify_detach_mark(chunk_entry);
 		mutex_unlock(&audit_tree_group->mark_mutex);
-		fsnotify_free_mark(chunk_entry);
-		fsnotify_put_mark(chunk_entry);
-		fsnotify_put_mark(old_entry);
+		fsnotify_put_mark(entry);
+		kfree(chunk);
 		return 0;
 	}
 	p = &chunk->owners[chunk->count - 1];
@@ -514,7 +518,6 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 	p->owner = tree;
 	get_tree(tree);
 	list_add(&p->list, &tree->chunks);
-	old->dead = 1;
 	if (!tree->root) {
 		tree->root = chunk;
 		list_add(&tree->same_root, &chunk->trees);
@@ -525,11 +528,10 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 	 */
 	replace_chunk(chunk, old, NULL);
 	spin_unlock(&hash_lock);
-	fsnotify_detach_mark(old_entry);
 	mutex_unlock(&audit_tree_group->mark_mutex);
-	fsnotify_free_mark(old_entry);
-	fsnotify_put_mark(chunk_entry);	/* drop initial reference */
-	fsnotify_put_mark(old_entry); /* pair to fsnotify_find mark_entry */
+	fsnotify_put_mark(entry); /* pair to fsnotify_find mark_entry */
+	audit_mark_put_chunk(old);
+
 	return 0;
 }
 
@@ -996,10 +998,6 @@ static void evict_chunk(struct audit_chunk *chunk)
 	int need_prune = 0;
 	int n;
 
-	if (chunk->dead)
-		return;
-
-	chunk->dead = 1;
 	mutex_lock(&audit_filter_mutex);
 	spin_lock(&hash_lock);
 	while (!list_empty(&chunk->trees)) {
@@ -1038,9 +1036,18 @@ static int audit_tree_handle_event(struct fsnotify_group *group,
 
 static void audit_tree_freeing_mark(struct fsnotify_mark *entry, struct fsnotify_group *group)
 {
-	struct audit_chunk *chunk = AUDIT_M(entry)->chunk;
+	struct audit_chunk *chunk;
 
-	evict_chunk(chunk);
+	mutex_lock(&entry->group->mark_mutex);
+	spin_lock(&hash_lock);
+	chunk = AUDIT_M(entry)->chunk;
+	replace_mark_chunk(entry, NULL);
+	spin_unlock(&hash_lock);
+	mutex_unlock(&entry->group->mark_mutex);
+	if (chunk) {
+		evict_chunk(chunk);
+		audit_mark_put_chunk(chunk);
+	}
 
 	/*
 	 * We are guaranteed to have at least one reference to the mark from
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 11/10 TESTSUITE] audit_testsuite: Add stress test for tree watches
  2018-07-10 10:02 [PATCH 0/10 v2] audit: Fix various races when tagging and untagging mounts Jan Kara
                   ` (9 preceding siblings ...)
  2018-07-10 10:02 ` [PATCH 10/10] audit: Replace chunk attached to mark instead of replacing mark Jan Kara
@ 2018-07-10 10:02 ` Jan Kara
  10 siblings, 0 replies; 32+ messages in thread
From: Jan Kara @ 2018-07-10 10:02 UTC (permalink / raw)
  To: linux-audit
  Cc: Paul Moore, linux-fsdevel, Al Viro, Richard Guy Briggs,
	Amir Goldstein, Jan Kara

Add stress test for stressing audit tree watches by adding and deleting
rules while events are generated and watched filesystems are mounted and
unmounted in parallel.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 tests/stress_tree/Makefile |   8 +++
 tests/stress_tree/test     | 171 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 179 insertions(+)
 create mode 100644 tests/stress_tree/Makefile
 create mode 100755 tests/stress_tree/test

diff --git a/tests/stress_tree/Makefile b/tests/stress_tree/Makefile
new file mode 100644
index 000000000000..7ade09aad86f
--- /dev/null
+++ b/tests/stress_tree/Makefile
@@ -0,0 +1,8 @@
+TARGETS=$(patsubst %.c,%,$(wildcard *.c))
+
+LDLIBS += -lpthread
+
+all: $(TARGETS)
+clean:
+	rm -f $(TARGETS)
+
diff --git a/tests/stress_tree/test b/tests/stress_tree/test
new file mode 100755
index 000000000000..6215bec810d1
--- /dev/null
+++ b/tests/stress_tree/test
@@ -0,0 +1,171 @@
+#!/usr/bin/perl
+
+use strict;
+
+use Test;
+BEGIN { plan tests => 1 }
+
+use File::Temp qw/ tempdir tempfile /;
+
+###
+# functions
+
+sub key_gen {
+	my @chars = ( "A" .. "Z", "a" .. "z" );
+	my $key = "testsuite-" . time . "-";
+	$key .= $chars[ rand @chars ] for 1 .. 8;
+	return $key;
+}
+
+# Run stat on random files in subtrees to generate audit events
+sub run_stat {
+	my($dir,$dirs) = @_;
+	my $path;
+
+	while (1) {
+		$path = "$dir/mnt/mnt".int(rand($dirs))."/subdir".int(rand($dirs));
+		stat($path);
+	}
+}
+
+# Generate audit rules for subtrees. Do one rule per subtree. Because watch
+# recursively iterates child mounts and we mount $dir/leaf$i under various
+# subtrees, the inode corresponding to $dir/leaf$i gets tagged by different
+# trees.
+sub run_mark_audit {
+	my($dir,$dirs,$key) = @_;
+
+	while (1) {
+		for (my $i=0; $i < $dirs; $i++) {
+			system("auditctl -w $dir/mnt/mnt$i -p r -k $key");
+		}
+		system("auditctl -D -k $key >& /dev/null");
+	}
+}
+
+sub umount_all {
+	my($dir,$dirs,$ignore_fail) = @_;
+
+	for (my $i=0; $i < $dirs; $i++) {
+		while (system("umount $dir/leaf$i >& /dev/null") > 0 &&
+		       $ignore_fail == 0) {
+			# Nothing - loop until umount succeeds
+		}
+	}
+	for (my $i=0; $i < $dirs; $i++) {
+		for (my $j=0; $j < $dirs; $j++) {
+			while (system("umount $dir/mnt/mnt$i/subdir$j >& /dev/null") > 0 &&
+			       $ignore_fail == 0) {
+				# Nothing - loop until umount succeeds
+			}
+		}
+		while (system("umount $dir/mnt/mnt$i >& /dev/null") > 0 &&
+		       $ignore_fail == 0) {
+			# Nothing - loop until umount succeeds
+		}
+	}
+}
+
+# Mount and unmount filesystems. We pick random leaf mount so that sometimes
+# a leaf mount point root inode will gather more tags from different trees
+# and sometimes we will be quicker in unmounting all instances of leaf and
+# thus excercise inode evistion path
+sub run_mount {
+	my($dir,$dirs) = @_;
+
+	while (1) {
+		# We use tmpfs here and not just bind mounts of some dir so
+		# that the root inode gets evicted once all instances are
+		# unmounted.
+		for (my $i=0; $i < $dirs; $i++) {
+			system("mount -t tmpfs none $dir/leaf$i");
+		}
+		for (my $i=0; $i < $dirs; $i++) {
+			system("mount --bind $dir/dir$i $dir/mnt/mnt$i");
+			for (my $j=0; $j < $dirs; $j++) {
+				my $leaf="$dir/leaf".int(rand($dirs));
+				system("mount --bind $leaf $dir/mnt/mnt$i/subdir$j");
+			}
+		}
+		umount_all($dir, $dirs, 0);
+	}
+}
+
+
+###
+# setup
+
+# reset audit
+system("auditctl -D >& /dev/null");
+
+# create temp directory
+my $dir = tempdir( TEMPLATE => '/tmp/audit-testsuite-XXXX', CLEANUP => 1 );
+
+# create stdout/stderr sinks
+( my $fh_out, my $stdout ) = tempfile(
+	TEMPLATE => '/tmp/audit-testsuite-out-XXXX',
+	UNLINK   => 1
+);
+( my $fh_err, my $stderr ) = tempfile(
+	TEMPLATE => '/tmp/audit-testsuite-err-XXXX',
+	UNLINK   => 1
+);
+
+###
+# tests
+
+my $dirs = 4;
+
+# setup directory hierarchy
+for (my $i=0; $i < $dirs; $i++) {
+	mkdir $dir."/dir".$i;
+	for (my $j=0; $j < $dirs; $j++) {
+		mkdir $dir."/dir".$i."/subdir".$j;
+	}
+}
+mkdir "$dir/mnt";
+for (my $i=0; $i < $dirs; $i++) {
+	mkdir "$dir/mnt/mnt$i";
+	mkdir "$dir/leaf$i";
+}
+
+my $stat_pid = fork();
+
+if ($stat_pid == 0) {
+	run_stat($dir, $dirs);
+	# Never reached
+	exit;
+}
+
+my $mount_pid = fork();
+
+if ($mount_pid == 0) {
+	run_mount($dir, $dirs);
+	# Never reached
+	exit;
+}
+
+my $key = key_gen();
+
+my $audit_pid = fork();
+
+if ($audit_pid == 0) {
+	run_mark_audit($dir, $dirs, $key);
+	# Never reached
+	exit;
+}
+
+# Sleep for a minute to let stress test run...
+sleep(60);
+ok(1);
+
+###
+# cleanup
+
+kill('KILL', $stat_pid, $mount_pid, $audit_pid);
+# Wait for children to terminate
+waitpid($stat_pid, 0);
+waitpid($mount_pid, 0);
+waitpid($audit_pid, 0);
+system("auditctl -D >& /dev/null");
+umount_all($dir, $dirs, 1);
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 06/10] audit: Factor out chunk replacement code
  2018-07-10 10:02 ` [PATCH 06/10] audit: Factor out chunk replacement code Jan Kara
@ 2018-07-11  7:58   ` Amir Goldstein
  2018-07-11  8:26     ` Jan Kara
  2018-07-27  4:47   ` Paul Moore
  1 sibling, 1 reply; 32+ messages in thread
From: Amir Goldstein @ 2018-07-11  7:58 UTC (permalink / raw)
  To: Jan Kara
  Cc: Linux Audit, Paul Moore, linux-fsdevel, Al Viro, Richard Guy Briggs

On Tue, Jul 10, 2018 at 1:02 PM, Jan Kara <jack@suse.cz> wrote:
> Chunk replacement code is very similar for the cases where we grow or
> shrink chunk. Factor the code out into a common helper function.
>
> Signed-off-by: Jan Kara <jack@suse.cz>

Ack. Nice re-factoring.

Thanks,
Amir.

> ---
>  kernel/audit_tree.c | 86 +++++++++++++++++++++++++----------------------------
>  1 file changed, 40 insertions(+), 46 deletions(-)
>
> diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
> index 6a01738d1ac2..f419fdfc25b4 100644
> --- a/kernel/audit_tree.c
> +++ b/kernel/audit_tree.c
> @@ -235,6 +235,38 @@ static struct audit_chunk *find_chunk(struct node *p)
>         return container_of(p, struct audit_chunk, owners[0]);
>  }
>
> +static void replace_chunk(struct audit_chunk *new, struct audit_chunk *old,
> +                         struct node *skip)
> +{
> +       struct audit_tree *owner;
> +       int i, j;
> +
> +       new->key = old->key;
> +       list_replace_init(&old->trees, &new->trees);
> +       list_for_each_entry(owner, &new->trees, same_root)
> +               owner->root = new;
> +       for (i = j = 0; j < old->count; i++, j++) {
> +               if (&old->owners[j] == skip) {
> +                       i--;
> +                       continue;
> +               }
> +               owner = old->owners[j].owner;
> +               new->owners[i].owner = owner;
> +               new->owners[i].index = old->owners[j].index - j + i;
> +               if (!owner) /* result of earlier fallback */
> +                       continue;
> +               get_tree(owner);
> +               list_replace_init(&old->owners[j].list, &new->owners[i].list);
> +       }
> +       /*
> +        * Make sure chunk is fully initialized before making it visible in the
> +        * hash. Pairs with a data dependency barrier in READ_ONCE() in
> +        * audit_tree_lookup().
> +        */
> +       smp_wmb();
> +       list_replace_rcu(&old->hash, &new->hash);
> +}
> +
>  static void untag_chunk(struct node *p)
>  {
>         struct audit_chunk *chunk = find_chunk(p);
> @@ -242,7 +274,6 @@ static void untag_chunk(struct node *p)
>         struct audit_chunk *new = NULL;
>         struct audit_tree *owner;
>         int size = chunk->count - 1;
> -       int i, j;
>
>         fsnotify_get_mark(entry);
>
> @@ -291,38 +322,16 @@ static void untag_chunk(struct node *p)
>
>         chunk->dead = 1;
>         spin_lock(&hash_lock);
> -       new->key = chunk->key;
> -       list_replace_init(&chunk->trees, &new->trees);
>         if (owner->root == chunk) {
>                 list_del_init(&owner->same_root);
>                 owner->root = NULL;
>         }
> -
> -       for (i = j = 0; j <= size; i++, j++) {
> -               struct audit_tree *s;
> -               if (&chunk->owners[j] == p) {
> -                       list_del_init(&p->list);
> -                       i--;
> -                       continue;
> -               }
> -               s = chunk->owners[j].owner;
> -               new->owners[i].owner = s;
> -               new->owners[i].index = chunk->owners[j].index - j + i;
> -               if (!s) /* result of earlier fallback */
> -                       continue;
> -               get_tree(s);
> -               list_replace_init(&chunk->owners[j].list, &new->owners[i].list);
> -       }
> -
> -       list_for_each_entry(owner, &new->trees, same_root)
> -               owner->root = new;
> +       list_del_init(&p->list);
>         /*
> -        * Make sure chunk is fully initialized before making it visible in the
> -        * hash. Pairs with a data dependency barrier in READ_ONCE() in
> -        * audit_tree_lookup().
> +        * This has to go last when updating chunk as once replace_chunk() is
> +        * called, new RCU readers can see the new chunk.
>          */
> -       smp_wmb();
> -       list_replace_rcu(&chunk->hash, &new->hash);
> +       replace_chunk(new, chunk, p);
>         spin_unlock(&hash_lock);
>         fsnotify_detach_mark(entry);
>         mutex_unlock(&entry->group->mark_mutex);
> @@ -399,7 +408,6 @@ static int create_chunk(struct inode *inode, struct audit_tree *tree)
>  static int tag_chunk(struct inode *inode, struct audit_tree *tree)
>  {
>         struct fsnotify_mark *old_entry, *chunk_entry;
> -       struct audit_tree *owner;
>         struct audit_chunk *chunk, *old;
>         struct node *p;
>         int n;
> @@ -464,35 +472,21 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
>                 fsnotify_put_mark(old_entry);
>                 return 0;
>         }
> -       chunk->key = old->key;
> -       list_replace_init(&old->trees, &chunk->trees);
> -       for (n = 0, p = chunk->owners; n < old->count; n++, p++) {
> -               struct audit_tree *s = old->owners[n].owner;
> -               p->owner = s;
> -               p->index = old->owners[n].index;
> -               if (!s) /* result of fallback in untag */
> -                       continue;
> -               get_tree(s);
> -               list_replace_init(&old->owners[n].list, &p->list);
> -       }
> +       p = &chunk->owners[chunk->count - 1];
>         p->index = (chunk->count - 1) | (1U<<31);
>         p->owner = tree;
>         get_tree(tree);
>         list_add(&p->list, &tree->chunks);
> -       list_for_each_entry(owner, &chunk->trees, same_root)
> -               owner->root = chunk;
>         old->dead = 1;
>         if (!tree->root) {
>                 tree->root = chunk;
>                 list_add(&tree->same_root, &chunk->trees);
>         }
>         /*
> -        * Make sure chunk is fully initialized before making it visible in the
> -        * hash. Pairs with a data dependency barrier in READ_ONCE() in
> -        * audit_tree_lookup().
> +        * This has to go last when updating chunk as once replace_chunk() is
> +        * called, new RCU readers can see the new chunk.
>          */
> -       smp_wmb();
> -       list_replace_rcu(&old->hash, &chunk->hash);
> +       replace_chunk(chunk, old, NULL);
>         spin_unlock(&hash_lock);
>         fsnotify_detach_mark(old_entry);
>         mutex_unlock(&audit_tree_group->mark_mutex);
> --
> 2.16.4
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 06/10] audit: Factor out chunk replacement code
  2018-07-11  7:58   ` Amir Goldstein
@ 2018-07-11  8:26     ` Jan Kara
  2018-07-11  9:01       ` Amir Goldstein
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Kara @ 2018-07-11  8:26 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, Linux Audit, Paul Moore, linux-fsdevel, Al Viro,
	Richard Guy Briggs

On Wed 11-07-18 10:58:24, Amir Goldstein wrote:
> On Tue, Jul 10, 2018 at 1:02 PM, Jan Kara <jack@suse.cz> wrote:
> > Chunk replacement code is very similar for the cases where we grow or
> > shrink chunk. Factor the code out into a common helper function.
> >
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> Ack. Nice re-factoring.

Thanks. BTW, I didn't add your Acked-by tags because it wasn't obvious to
me whether you meant them as an official tag-worthy statement. So do want
me to add the tags or do you prefer to stay "anonymous reviewer" ;)?

								Honza

> 
> Thanks,
> Amir.
> 
> > ---
> >  kernel/audit_tree.c | 86 +++++++++++++++++++++++++----------------------------
> >  1 file changed, 40 insertions(+), 46 deletions(-)
> >
> > diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
> > index 6a01738d1ac2..f419fdfc25b4 100644
> > --- a/kernel/audit_tree.c
> > +++ b/kernel/audit_tree.c
> > @@ -235,6 +235,38 @@ static struct audit_chunk *find_chunk(struct node *p)
> >         return container_of(p, struct audit_chunk, owners[0]);
> >  }
> >
> > +static void replace_chunk(struct audit_chunk *new, struct audit_chunk *old,
> > +                         struct node *skip)
> > +{
> > +       struct audit_tree *owner;
> > +       int i, j;
> > +
> > +       new->key = old->key;
> > +       list_replace_init(&old->trees, &new->trees);
> > +       list_for_each_entry(owner, &new->trees, same_root)
> > +               owner->root = new;
> > +       for (i = j = 0; j < old->count; i++, j++) {
> > +               if (&old->owners[j] == skip) {
> > +                       i--;
> > +                       continue;
> > +               }
> > +               owner = old->owners[j].owner;
> > +               new->owners[i].owner = owner;
> > +               new->owners[i].index = old->owners[j].index - j + i;
> > +               if (!owner) /* result of earlier fallback */
> > +                       continue;
> > +               get_tree(owner);
> > +               list_replace_init(&old->owners[j].list, &new->owners[i].list);
> > +       }
> > +       /*
> > +        * Make sure chunk is fully initialized before making it visible in the
> > +        * hash. Pairs with a data dependency barrier in READ_ONCE() in
> > +        * audit_tree_lookup().
> > +        */
> > +       smp_wmb();
> > +       list_replace_rcu(&old->hash, &new->hash);
> > +}
> > +
> >  static void untag_chunk(struct node *p)
> >  {
> >         struct audit_chunk *chunk = find_chunk(p);
> > @@ -242,7 +274,6 @@ static void untag_chunk(struct node *p)
> >         struct audit_chunk *new = NULL;
> >         struct audit_tree *owner;
> >         int size = chunk->count - 1;
> > -       int i, j;
> >
> >         fsnotify_get_mark(entry);
> >
> > @@ -291,38 +322,16 @@ static void untag_chunk(struct node *p)
> >
> >         chunk->dead = 1;
> >         spin_lock(&hash_lock);
> > -       new->key = chunk->key;
> > -       list_replace_init(&chunk->trees, &new->trees);
> >         if (owner->root == chunk) {
> >                 list_del_init(&owner->same_root);
> >                 owner->root = NULL;
> >         }
> > -
> > -       for (i = j = 0; j <= size; i++, j++) {
> > -               struct audit_tree *s;
> > -               if (&chunk->owners[j] == p) {
> > -                       list_del_init(&p->list);
> > -                       i--;
> > -                       continue;
> > -               }
> > -               s = chunk->owners[j].owner;
> > -               new->owners[i].owner = s;
> > -               new->owners[i].index = chunk->owners[j].index - j + i;
> > -               if (!s) /* result of earlier fallback */
> > -                       continue;
> > -               get_tree(s);
> > -               list_replace_init(&chunk->owners[j].list, &new->owners[i].list);
> > -       }
> > -
> > -       list_for_each_entry(owner, &new->trees, same_root)
> > -               owner->root = new;
> > +       list_del_init(&p->list);
> >         /*
> > -        * Make sure chunk is fully initialized before making it visible in the
> > -        * hash. Pairs with a data dependency barrier in READ_ONCE() in
> > -        * audit_tree_lookup().
> > +        * This has to go last when updating chunk as once replace_chunk() is
> > +        * called, new RCU readers can see the new chunk.
> >          */
> > -       smp_wmb();
> > -       list_replace_rcu(&chunk->hash, &new->hash);
> > +       replace_chunk(new, chunk, p);
> >         spin_unlock(&hash_lock);
> >         fsnotify_detach_mark(entry);
> >         mutex_unlock(&entry->group->mark_mutex);
> > @@ -399,7 +408,6 @@ static int create_chunk(struct inode *inode, struct audit_tree *tree)
> >  static int tag_chunk(struct inode *inode, struct audit_tree *tree)
> >  {
> >         struct fsnotify_mark *old_entry, *chunk_entry;
> > -       struct audit_tree *owner;
> >         struct audit_chunk *chunk, *old;
> >         struct node *p;
> >         int n;
> > @@ -464,35 +472,21 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
> >                 fsnotify_put_mark(old_entry);
> >                 return 0;
> >         }
> > -       chunk->key = old->key;
> > -       list_replace_init(&old->trees, &chunk->trees);
> > -       for (n = 0, p = chunk->owners; n < old->count; n++, p++) {
> > -               struct audit_tree *s = old->owners[n].owner;
> > -               p->owner = s;
> > -               p->index = old->owners[n].index;
> > -               if (!s) /* result of fallback in untag */
> > -                       continue;
> > -               get_tree(s);
> > -               list_replace_init(&old->owners[n].list, &p->list);
> > -       }
> > +       p = &chunk->owners[chunk->count - 1];
> >         p->index = (chunk->count - 1) | (1U<<31);
> >         p->owner = tree;
> >         get_tree(tree);
> >         list_add(&p->list, &tree->chunks);
> > -       list_for_each_entry(owner, &chunk->trees, same_root)
> > -               owner->root = chunk;
> >         old->dead = 1;
> >         if (!tree->root) {
> >                 tree->root = chunk;
> >                 list_add(&tree->same_root, &chunk->trees);
> >         }
> >         /*
> > -        * Make sure chunk is fully initialized before making it visible in the
> > -        * hash. Pairs with a data dependency barrier in READ_ONCE() in
> > -        * audit_tree_lookup().
> > +        * This has to go last when updating chunk as once replace_chunk() is
> > +        * called, new RCU readers can see the new chunk.
> >          */
> > -       smp_wmb();
> > -       list_replace_rcu(&old->hash, &chunk->hash);
> > +       replace_chunk(chunk, old, NULL);
> >         spin_unlock(&hash_lock);
> >         fsnotify_detach_mark(old_entry);
> >         mutex_unlock(&audit_tree_group->mark_mutex);
> > --
> > 2.16.4
> >
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/10] audit: Allocate fsnotify mark independently of chunk
  2018-07-10 10:02 ` [PATCH 09/10] audit: Allocate fsnotify mark independently of chunk Jan Kara
@ 2018-07-11  8:57   ` Amir Goldstein
  2018-07-11 10:48     ` Amir Goldstein
  2018-07-27  4:47   ` Paul Moore
  1 sibling, 1 reply; 32+ messages in thread
From: Amir Goldstein @ 2018-07-11  8:57 UTC (permalink / raw)
  To: Jan Kara
  Cc: Linux Audit, Paul Moore, linux-fsdevel, Al Viro, Richard Guy Briggs

On Tue, Jul 10, 2018 at 1:02 PM, Jan Kara <jack@suse.cz> wrote:
> Allocate fsnotify mark independently instead of embedding it inside
> chunk. This will allow us to just replace chunk attached to mark when
> growing / shrinking chunk instead of replacing mark attached to inode
> which is a more complex operation.
>
> Signed-off-by: Jan Kara <jack@suse.cz>

Ack.

Thanks for separating this patch.
Amir.

> ---
>  kernel/audit_tree.c | 59 ++++++++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 45 insertions(+), 14 deletions(-)
>
> diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
> index bce3b04a365d..aec9b27a20ff 100644
> --- a/kernel/audit_tree.c
> +++ b/kernel/audit_tree.c
> @@ -25,7 +25,7 @@ struct audit_tree {
>  struct audit_chunk {
>         struct list_head hash;
>         unsigned long key;
> -       struct fsnotify_mark mark;
> +       struct fsnotify_mark *mark;
>         struct list_head trees;         /* with root here */
>         int dead;
>         int count;
> @@ -38,6 +38,11 @@ struct audit_chunk {
>         } owners[];
>  };
>
> +struct audit_tree_mark {
> +       struct fsnotify_mark fsn_mark;
> +       struct audit_chunk *chunk;
> +};
> +
>  static LIST_HEAD(tree_list);
>  static LIST_HEAD(prune_list);
>  static struct task_struct *prune_thread;
> @@ -73,6 +78,7 @@ static struct task_struct *prune_thread;
>   */
>
>  static struct fsnotify_group *audit_tree_group;
> +static struct kmem_cache *audit_tree_mark_cachep __read_mostly;
>
>  static struct audit_tree *alloc_tree(const char *s)
>  {
> @@ -142,10 +148,28 @@ static void audit_mark_put_chunk(struct audit_chunk *chunk)
>         call_rcu(&chunk->head, __put_chunk);
>  }
>
> +static inline struct audit_tree_mark *AUDIT_M(struct fsnotify_mark *entry)
> +{
> +       return container_of(entry, struct audit_tree_mark, fsn_mark);
> +}
> +
>  static void audit_tree_destroy_watch(struct fsnotify_mark *entry)
>  {
> -       struct audit_chunk *chunk = container_of(entry, struct audit_chunk, mark);
> +       struct audit_chunk *chunk = AUDIT_M(entry)->chunk;
>         audit_mark_put_chunk(chunk);
> +       kmem_cache_free(audit_tree_mark_cachep, entry);
> +}
> +
> +static struct fsnotify_mark *alloc_fsnotify_mark(void)
> +{
> +       struct audit_tree_mark *mark;
> +
> +       mark = kmem_cache_zalloc(audit_tree_mark_cachep, GFP_KERNEL);
> +       if (!mark)
> +               return NULL;
> +       fsnotify_init_mark(&mark->fsn_mark, audit_tree_group);
> +       mark->fsn_mark.mask = FS_IN_IGNORED;
> +       return &mark->fsn_mark;
>  }
>
>  static struct audit_chunk *alloc_chunk(int count)
> @@ -159,6 +183,13 @@ static struct audit_chunk *alloc_chunk(int count)
>         if (!chunk)
>                 return NULL;
>
> +       chunk->mark = alloc_fsnotify_mark();
> +       if (!chunk->mark) {
> +               kfree(chunk);
> +               return NULL;
> +       }
> +       AUDIT_M(chunk->mark)->chunk = chunk;
> +
>         INIT_LIST_HEAD(&chunk->hash);
>         INIT_LIST_HEAD(&chunk->trees);
>         chunk->count = count;
> @@ -167,8 +198,6 @@ static struct audit_chunk *alloc_chunk(int count)
>                 INIT_LIST_HEAD(&chunk->owners[i].list);
>                 chunk->owners[i].index = i;
>         }
> -       fsnotify_init_mark(&chunk->mark, audit_tree_group);
> -       chunk->mark.mask = FS_IN_IGNORED;
>         return chunk;
>  }
>
> @@ -278,7 +307,7 @@ static void replace_chunk(struct audit_chunk *new, struct audit_chunk *old,
>  static void untag_chunk(struct node *p)
>  {
>         struct audit_chunk *chunk = find_chunk(p);
> -       struct fsnotify_mark *entry = &chunk->mark;
> +       struct fsnotify_mark *entry = chunk->mark;
>         struct audit_chunk *new = NULL;
>         struct audit_tree *owner;
>         int size = chunk->count - 1;
> @@ -298,7 +327,7 @@ static void untag_chunk(struct node *p)
>         if (chunk->dead || !(entry->flags & FSNOTIFY_MARK_FLAG_ATTACHED)) {
>                 mutex_unlock(&entry->group->mark_mutex);
>                 if (new)
> -                       fsnotify_put_mark(&new->mark);
> +                       fsnotify_put_mark(new->mark);
>                 goto out;
>         }
>
> @@ -322,9 +351,9 @@ static void untag_chunk(struct node *p)
>         if (!new)
>                 goto Fallback;
>
> -       if (fsnotify_add_mark_locked(&new->mark, entry->connector->obj,
> +       if (fsnotify_add_mark_locked(new->mark, entry->connector->obj,
>                                      FSNOTIFY_OBJ_TYPE_INODE, 1)) {
> -               fsnotify_put_mark(&new->mark);
> +               fsnotify_put_mark(new->mark);
>                 goto Fallback;
>         }
>
> @@ -344,7 +373,7 @@ static void untag_chunk(struct node *p)
>         fsnotify_detach_mark(entry);
>         mutex_unlock(&entry->group->mark_mutex);
>         fsnotify_free_mark(entry);
> -       fsnotify_put_mark(&new->mark);  /* drop initial reference */
> +       fsnotify_put_mark(new->mark);   /* drop initial reference */
>         goto out;
>
>  Fallback:
> @@ -375,7 +404,7 @@ static int create_chunk(struct inode *inode, struct audit_tree *tree)
>                 return -ENOMEM;
>         }
>
> -       entry = &chunk->mark;
> +       entry = chunk->mark;
>         if (fsnotify_add_inode_mark_locked(entry, inode, 0)) {
>                 mutex_unlock(&audit_tree_group->mark_mutex);
>                 fsnotify_put_mark(entry);
> @@ -426,7 +455,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
>         if (!old_entry)
>                 return create_chunk(inode, tree);
>
> -       old = container_of(old_entry, struct audit_chunk, mark);
> +       old = AUDIT_M(old_entry)->chunk;
>
>         /* are we already there? */
>         spin_lock(&hash_lock);
> @@ -447,7 +476,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
>                 return -ENOMEM;
>         }
>
> -       chunk_entry = &chunk->mark;
> +       chunk_entry = chunk->mark;
>
>         /*
>          * mark_mutex protects mark from getting detached and thus also from
> @@ -457,7 +486,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
>                 /* old_entry is being shot, lets just lie */
>                 mutex_unlock(&audit_tree_group->mark_mutex);
>                 fsnotify_put_mark(old_entry);
> -               fsnotify_put_mark(&chunk->mark);
> +               fsnotify_put_mark(chunk->mark);
>                 return -ENOENT;
>         }
>
> @@ -1009,7 +1038,7 @@ static int audit_tree_handle_event(struct fsnotify_group *group,
>
>  static void audit_tree_freeing_mark(struct fsnotify_mark *entry, struct fsnotify_group *group)
>  {
> -       struct audit_chunk *chunk = container_of(entry, struct audit_chunk, mark);
> +       struct audit_chunk *chunk = AUDIT_M(entry)->chunk;
>
>         evict_chunk(chunk);
>
> @@ -1030,6 +1059,8 @@ static int __init audit_tree_init(void)
>  {
>         int i;
>
> +       audit_tree_mark_cachep = KMEM_CACHE(audit_tree_mark, SLAB_PANIC);
> +
>         audit_tree_group = fsnotify_alloc_group(&audit_tree_ops);
>         if (IS_ERR(audit_tree_group))
>                 audit_panic("cannot initialize fsnotify group for rectree watches");
> --
> 2.16.4
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 06/10] audit: Factor out chunk replacement code
  2018-07-11  8:26     ` Jan Kara
@ 2018-07-11  9:01       ` Amir Goldstein
  2018-07-11  9:23         ` Jan Kara
  0 siblings, 1 reply; 32+ messages in thread
From: Amir Goldstein @ 2018-07-11  9:01 UTC (permalink / raw)
  To: Jan Kara
  Cc: Linux Audit, Paul Moore, linux-fsdevel, Al Viro, Richard Guy Briggs

On Wed, Jul 11, 2018 at 11:26 AM, Jan Kara <jack@suse.cz> wrote:
> On Wed 11-07-18 10:58:24, Amir Goldstein wrote:
>> On Tue, Jul 10, 2018 at 1:02 PM, Jan Kara <jack@suse.cz> wrote:
>> > Chunk replacement code is very similar for the cases where we grow or
>> > shrink chunk. Factor the code out into a common helper function.
>> >
>> > Signed-off-by: Jan Kara <jack@suse.cz>
>>
>> Ack. Nice re-factoring.
>
> Thanks. BTW, I didn't add your Acked-by tags because it wasn't obvious to
> me whether you meant them as an official tag-worthy statement. So do want
> me to add the tags or do you prefer to stay "anonymous reviewer" ;)?
>

Don't feel confident enough with Reviewed-by/Acked-by on this series,
just letting you know that I eyeballed your patches FWIW ;-)

I'm down to the last one, which is agonizing to make sure nothing is missed
from the diff. At this point [10/10] I'd rather review the final code.

Can you push the work to a branch.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 06/10] audit: Factor out chunk replacement code
  2018-07-11  9:01       ` Amir Goldstein
@ 2018-07-11  9:23         ` Jan Kara
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Kara @ 2018-07-11  9:23 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, Linux Audit, Paul Moore, linux-fsdevel, Al Viro,
	Richard Guy Briggs

On Wed 11-07-18 12:01:35, Amir Goldstein wrote:
> On Wed, Jul 11, 2018 at 11:26 AM, Jan Kara <jack@suse.cz> wrote:
> > On Wed 11-07-18 10:58:24, Amir Goldstein wrote:
> >> On Tue, Jul 10, 2018 at 1:02 PM, Jan Kara <jack@suse.cz> wrote:
> >> > Chunk replacement code is very similar for the cases where we grow or
> >> > shrink chunk. Factor the code out into a common helper function.
> >> >
> >> > Signed-off-by: Jan Kara <jack@suse.cz>
> >>
> >> Ack. Nice re-factoring.
> >
> > Thanks. BTW, I didn't add your Acked-by tags because it wasn't obvious to
> > me whether you meant them as an official tag-worthy statement. So do want
> > me to add the tags or do you prefer to stay "anonymous reviewer" ;)?
> >
> 
> Don't feel confident enough with Reviewed-by/Acked-by on this series,
> just letting you know that I eyeballed your patches FWIW ;-)
> 
> I'm down to the last one, which is agonizing to make sure nothing is missed
> from the diff. At this point [10/10] I'd rather review the final code.

Yes, changing lifetime rules is almost always painful, especially in the
code you don't understand too well. It took me a day to understand audit
well enough to be confident to make that change...

> Can you push the work to a branch.

Sure, audit_cleanup branch pushed out to
  git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git

Thanks for your review!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/10] audit: Allocate fsnotify mark independently of chunk
  2018-07-11  8:57   ` Amir Goldstein
@ 2018-07-11 10:48     ` Amir Goldstein
  2018-07-16 15:13       ` Jan Kara
  0 siblings, 1 reply; 32+ messages in thread
From: Amir Goldstein @ 2018-07-11 10:48 UTC (permalink / raw)
  To: Jan Kara
  Cc: Linux Audit, Paul Moore, linux-fsdevel, Al Viro, Richard Guy Briggs

On Wed, Jul 11, 2018 at 11:57 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Tue, Jul 10, 2018 at 1:02 PM, Jan Kara <jack@suse.cz> wrote:
>> Allocate fsnotify mark independently instead of embedding it inside
>> chunk. This will allow us to just replace chunk attached to mark when
>> growing / shrinking chunk instead of replacing mark attached to inode
>> which is a more complex operation.
>>
>> Signed-off-by: Jan Kara <jack@suse.cz>
>
> Ack.

Found minor nit.

[...]
>> +static inline struct audit_tree_mark *AUDIT_M(struct fsnotify_mark *entry)
>> +{
>> +       return container_of(entry, struct audit_tree_mark, fsn_mark);
>> +}
>> +
>>  static void audit_tree_destroy_watch(struct fsnotify_mark *entry)
>>  {
>> -       struct audit_chunk *chunk = container_of(entry, struct audit_chunk, mark);
>> +       struct audit_chunk *chunk = AUDIT_M(entry)->chunk;
>>         audit_mark_put_chunk(chunk);
>> +       kmem_cache_free(audit_tree_mark_cachep, entry);

Technically, we should be freeing AUDIT_M(entry)
although it is the same address with current struct layout.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 10/10] audit: Replace chunk attached to mark instead of replacing mark
  2018-07-10 10:02 ` [PATCH 10/10] audit: Replace chunk attached to mark instead of replacing mark Jan Kara
@ 2018-07-11 14:17   ` Amir Goldstein
  2018-07-27  4:47   ` Paul Moore
  1 sibling, 0 replies; 32+ messages in thread
From: Amir Goldstein @ 2018-07-11 14:17 UTC (permalink / raw)
  To: Jan Kara
  Cc: Linux Audit, Paul Moore, linux-fsdevel, Al Viro, Richard Guy Briggs

On Tue, Jul 10, 2018 at 1:02 PM, Jan Kara <jack@suse.cz> wrote:
> Audit tree code currently associates new fsnotify mark with each new
> chunk. As chunk attached to an inode is replaced when new tag is added /
> removed, we also need to remove old fsnotify mark and add a new one on
> such occasion.  This is cumbersome and makes locking rules somewhat
> difficult to follow.
>
> Fix these problems by allocating fsnotify mark independently of chunk
> and keeping it all the time while there is some chunk attached to an
> inode. Also add documentation about the locking rules so that things are
> easier to follow.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---

Well, if there are bugs here I can't find them, but they sure have a lot of
places to hide...

Cheers,
Amir.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/10] audit: Allocate fsnotify mark independently of chunk
  2018-07-11 10:48     ` Amir Goldstein
@ 2018-07-16 15:13       ` Jan Kara
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Kara @ 2018-07-16 15:13 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, Linux Audit, Paul Moore, linux-fsdevel, Al Viro,
	Richard Guy Briggs

On Wed 11-07-18 13:48:58, Amir Goldstein wrote:
> On Wed, Jul 11, 2018 at 11:57 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> > On Tue, Jul 10, 2018 at 1:02 PM, Jan Kara <jack@suse.cz> wrote:
> >> Allocate fsnotify mark independently instead of embedding it inside
> >> chunk. This will allow us to just replace chunk attached to mark when
> >> growing / shrinking chunk instead of replacing mark attached to inode
> >> which is a more complex operation.
> >>
> >> Signed-off-by: Jan Kara <jack@suse.cz>
> >
> > Ack.
> 
> Found minor nit.
> 
> [...]
> >> +static inline struct audit_tree_mark *AUDIT_M(struct fsnotify_mark *entry)
> >> +{
> >> +       return container_of(entry, struct audit_tree_mark, fsn_mark);
> >> +}
> >> +
> >>  static void audit_tree_destroy_watch(struct fsnotify_mark *entry)
> >>  {
> >> -       struct audit_chunk *chunk = container_of(entry, struct audit_chunk, mark);
> >> +       struct audit_chunk *chunk = AUDIT_M(entry)->chunk;
> >>         audit_mark_put_chunk(chunk);
> >> +       kmem_cache_free(audit_tree_mark_cachep, entry);
> 
> Technically, we should be freeing AUDIT_M(entry)
> although it is the same address with current struct layout.

Right, thanks for spotting this. Fixed in my tree.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 01/10] audit_tree: Remove mark->lock locking
  2018-07-10 10:02 ` [PATCH 01/10] audit_tree: Remove mark->lock locking Jan Kara
@ 2018-07-27  4:47   ` Paul Moore
  2018-09-04  9:53     ` Jan Kara
  0 siblings, 1 reply; 32+ messages in thread
From: Paul Moore @ 2018-07-27  4:47 UTC (permalink / raw)
  To: jack; +Cc: linux-audit, linux-fsdevel, viro, rgb, amir73il

On Tue, Jul 10, 2018 at 6:02 AM Jan Kara <jack@suse.cz> wrote:
> Currently, audit_tree code uses mark->lock to protect against detaching
> of mark from an inode. In most places it however also uses
> mark->group->mark_mutex (as we need to atomically replace attached
> marks) and this provides protection against mark detaching as well. So
> just remove protection with mark->lock from audit tree code and replace
> it with mark->group->mark_mutex protection in all the places. It
> simplifies the code and gets rid of some ugly catches like calling
> fsnotify_add_mark_locked() with mark->lock held (which cannot sleep only
> because we hold a reference to another mark attached to the same inode).
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  kernel/audit_tree.c | 24 ++++--------------------
>  1 file changed, 4 insertions(+), 20 deletions(-)

...

> diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
> index 02feef939560..1c82eb6674c4 100644
> --- a/kernel/audit_tree.c
> +++ b/kernel/audit_tree.c
> @@ -360,12 +355,12 @@ static int create_chunk(struct inode *inode, struct audit_tree *tree)
>                 return -ENOSPC;
>         }
>
> -       spin_lock(&entry->lock);
> +       mutex_lock(&entry->group->mark_mutex);

I wonder if we could move the lock up above the
fsnotify_add_inode_mark() earlier in create_chunk() and use
fsnotify_add_mark_locked()?

--
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/10] audit: Fix possible spurious -ENOSPC error
  2018-07-10 10:02 ` [PATCH 02/10] audit: Fix possible spurious -ENOSPC error Jan Kara
@ 2018-07-27  4:47   ` Paul Moore
  2018-09-04 10:00     ` Jan Kara
  0 siblings, 1 reply; 32+ messages in thread
From: Paul Moore @ 2018-07-27  4:47 UTC (permalink / raw)
  To: jack; +Cc: linux-audit, linux-fsdevel, viro, rgb, amir73il

On Tue, Jul 10, 2018 at 6:02 AM Jan Kara <jack@suse.cz> wrote:
> When an inode is tagged with a tree, tag_chunk() checks whether there is
> audit_tree_group mark attached to the inode and adds one if not. However
> nothing protects another tag_chunk() to add the mark between we've
> checked and try to add the fsnotify mark thus resulting in an error from
> fsnotify_add_mark() and consequently an ENOSPC error from tag_chunk().
>
> Fix the problem by holding mark_mutex over the whole check-insert code
> sequence.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  kernel/audit_tree.c | 26 ++++++++++++++++----------
>  1 file changed, 16 insertions(+), 10 deletions(-)

...

> diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
> index 1c82eb6674c4..de8d344d91b1 100644
> --- a/kernel/audit_tree.c
> +++ b/kernel/audit_tree.c
> @@ -342,25 +342,29 @@ static void untag_chunk(struct node *p)
>         spin_lock(&hash_lock);
>  }
>
> +/* Call with group->mark_mutex held, releases it */

Stuff like that always makes me nervous.  Could we defer releasing the
mutex to the caller, after create_chunk() returns?  It looks like
fsnotify_destroy_mark() allows a single level of nesting so it should
be okay, yes?

--
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 04/10] audit: Embed key into chunk
  2018-07-10 10:02 ` [PATCH 04/10] audit: Embed key into chunk Jan Kara
@ 2018-07-27  4:47   ` Paul Moore
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Moore @ 2018-07-27  4:47 UTC (permalink / raw)
  To: jack; +Cc: linux-audit, linux-fsdevel, viro, rgb, amir73il

On Tue, Jul 10, 2018 at 6:02 AM Jan Kara <jack@suse.cz> wrote:
> Currently chunk hash key (which is in fact pointer to the inode) is
> derived as chunk->mark.conn->obj. It is tricky to make this dereference
> reliable for hash table lookups only under RCU as mark can get detached
> from the connector and connector gets freed independently of the
> running lookup. Thus there is a possible use after free / NULL ptr
> dereference issue:
>
> CPU1                                    CPU2
>                                         untag_chunk()
>                                           ...
> audit_tree_lookup()
>   list_for_each_entry_rcu(p, list, hash) {
>                                           list_del_rcu(&chunk->hash);
>                                           fsnotify_destroy_mark(entry);
>                                           fsnotify_put_mark(entry)
>     chunk_to_key(p)
>       if (!chunk->mark.connector)
>                                             ...
>                                             hlist_del_init_rcu(&mark->obj_list);
>                                             if (hlist_empty(&conn->list)) {
>                                               inode = fsnotify_detach_connector_from_object(conn);
>                                             mark->connector = NULL;
>                                             ...
>                                             frees connector from workqueue
>       chunk->mark.connector->obj
>
> This race is probably impossible to hit in practice as the race window
> on CPU1 is very narrow and CPU2 has a lot of code to execute. Still it's
> better to have this fixed. Since the inode the chunk is attached to is
> constant during chunk's lifetime it is easy to cache the key in the
> chunk itself and thus avoid these issues.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  kernel/audit_tree.c | 27 ++++++++-------------------
>  1 file changed, 8 insertions(+), 19 deletions(-)

This looks okay to me.

--
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 06/10] audit: Factor out chunk replacement code
  2018-07-10 10:02 ` [PATCH 06/10] audit: Factor out chunk replacement code Jan Kara
  2018-07-11  7:58   ` Amir Goldstein
@ 2018-07-27  4:47   ` Paul Moore
  1 sibling, 0 replies; 32+ messages in thread
From: Paul Moore @ 2018-07-27  4:47 UTC (permalink / raw)
  To: jack; +Cc: linux-audit, linux-fsdevel, viro, rgb, amir73il

On Tue, Jul 10, 2018 at 6:02 AM Jan Kara <jack@suse.cz> wrote:
> Chunk replacement code is very similar for the cases where we grow or
> shrink chunk. Factor the code out into a common helper function.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  kernel/audit_tree.c | 86 +++++++++++++++++++++++++----------------------------
>  1 file changed, 40 insertions(+), 46 deletions(-)

Ooof.  That code sucks.  Not your code Jan, rather the code you are
replacing.  Thanks for doing this.

I believe this looks right, although see my previous comment/question
about RCU/barriers/etc. that affects some of this code (inherited from
a previous patch in the patchset).

> diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
> index 6a01738d1ac2..f419fdfc25b4 100644
> --- a/kernel/audit_tree.c
> +++ b/kernel/audit_tree.c
> @@ -235,6 +235,38 @@ static struct audit_chunk *find_chunk(struct node *p)
>         return container_of(p, struct audit_chunk, owners[0]);
>  }
>
> +static void replace_chunk(struct audit_chunk *new, struct audit_chunk *old,
> +                         struct node *skip)
> +{
> +       struct audit_tree *owner;
> +       int i, j;
> +
> +       new->key = old->key;
> +       list_replace_init(&old->trees, &new->trees);
> +       list_for_each_entry(owner, &new->trees, same_root)
> +               owner->root = new;
> +       for (i = j = 0; j < old->count; i++, j++) {
> +               if (&old->owners[j] == skip) {
> +                       i--;
> +                       continue;
> +               }
> +               owner = old->owners[j].owner;
> +               new->owners[i].owner = owner;
> +               new->owners[i].index = old->owners[j].index - j + i;
> +               if (!owner) /* result of earlier fallback */
> +                       continue;
> +               get_tree(owner);
> +               list_replace_init(&old->owners[j].list, &new->owners[i].list);
> +       }
> +       /*
> +        * Make sure chunk is fully initialized before making it visible in the
> +        * hash. Pairs with a data dependency barrier in READ_ONCE() in
> +        * audit_tree_lookup().
> +        */
> +       smp_wmb();
> +       list_replace_rcu(&old->hash, &new->hash);
> +}
> +
>  static void untag_chunk(struct node *p)
>  {
>         struct audit_chunk *chunk = find_chunk(p);
> @@ -242,7 +274,6 @@ static void untag_chunk(struct node *p)
>         struct audit_chunk *new = NULL;
>         struct audit_tree *owner;
>         int size = chunk->count - 1;
> -       int i, j;
>
>         fsnotify_get_mark(entry);
>
> @@ -291,38 +322,16 @@ static void untag_chunk(struct node *p)
>
>         chunk->dead = 1;
>         spin_lock(&hash_lock);
> -       new->key = chunk->key;
> -       list_replace_init(&chunk->trees, &new->trees);
>         if (owner->root == chunk) {
>                 list_del_init(&owner->same_root);
>                 owner->root = NULL;
>         }
> -
> -       for (i = j = 0; j <= size; i++, j++) {
> -               struct audit_tree *s;
> -               if (&chunk->owners[j] == p) {
> -                       list_del_init(&p->list);
> -                       i--;
> -                       continue;
> -               }
> -               s = chunk->owners[j].owner;
> -               new->owners[i].owner = s;
> -               new->owners[i].index = chunk->owners[j].index - j + i;
> -               if (!s) /* result of earlier fallback */
> -                       continue;
> -               get_tree(s);
> -               list_replace_init(&chunk->owners[j].list, &new->owners[i].list);
> -       }
> -
> -       list_for_each_entry(owner, &new->trees, same_root)
> -               owner->root = new;
> +       list_del_init(&p->list);
>         /*
> -        * Make sure chunk is fully initialized before making it visible in the
> -        * hash. Pairs with a data dependency barrier in READ_ONCE() in
> -        * audit_tree_lookup().
> +        * This has to go last when updating chunk as once replace_chunk() is
> +        * called, new RCU readers can see the new chunk.
>          */
> -       smp_wmb();
> -       list_replace_rcu(&chunk->hash, &new->hash);
> +       replace_chunk(new, chunk, p);
>         spin_unlock(&hash_lock);
>         fsnotify_detach_mark(entry);
>         mutex_unlock(&entry->group->mark_mutex);
> @@ -399,7 +408,6 @@ static int create_chunk(struct inode *inode, struct audit_tree *tree)
>  static int tag_chunk(struct inode *inode, struct audit_tree *tree)
>  {
>         struct fsnotify_mark *old_entry, *chunk_entry;
> -       struct audit_tree *owner;
>         struct audit_chunk *chunk, *old;
>         struct node *p;
>         int n;
> @@ -464,35 +472,21 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
>                 fsnotify_put_mark(old_entry);
>                 return 0;
>         }
> -       chunk->key = old->key;
> -       list_replace_init(&old->trees, &chunk->trees);
> -       for (n = 0, p = chunk->owners; n < old->count; n++, p++) {
> -               struct audit_tree *s = old->owners[n].owner;
> -               p->owner = s;
> -               p->index = old->owners[n].index;
> -               if (!s) /* result of fallback in untag */
> -                       continue;
> -               get_tree(s);
> -               list_replace_init(&old->owners[n].list, &p->list);
> -       }
> +       p = &chunk->owners[chunk->count - 1];
>         p->index = (chunk->count - 1) | (1U<<31);
>         p->owner = tree;
>         get_tree(tree);
>         list_add(&p->list, &tree->chunks);
> -       list_for_each_entry(owner, &chunk->trees, same_root)
> -               owner->root = chunk;
>         old->dead = 1;
>         if (!tree->root) {
>                 tree->root = chunk;
>                 list_add(&tree->same_root, &chunk->trees);
>         }
>         /*
> -        * Make sure chunk is fully initialized before making it visible in the
> -        * hash. Pairs with a data dependency barrier in READ_ONCE() in
> -        * audit_tree_lookup().
> +        * This has to go last when updating chunk as once replace_chunk() is
> +        * called, new RCU readers can see the new chunk.
>          */
> -       smp_wmb();
> -       list_replace_rcu(&old->hash, &chunk->hash);
> +       replace_chunk(chunk, old, NULL);
>         spin_unlock(&hash_lock);
>         fsnotify_detach_mark(old_entry);
>         mutex_unlock(&audit_tree_group->mark_mutex);
> --
> 2.16.4

--
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 07/10] audit: Remove pointless check in insert_hash()
  2018-07-10 10:02 ` [PATCH 07/10] audit: Remove pointless check in insert_hash() Jan Kara
@ 2018-07-27  4:47   ` Paul Moore
  0 siblings, 0 replies; 32+ messages in thread
From: Paul Moore @ 2018-07-27  4:47 UTC (permalink / raw)
  To: jack; +Cc: linux-audit, linux-fsdevel, viro, rgb, amir73il

On Tue, Jul 10, 2018 at 6:02 AM Jan Kara <jack@suse.cz> wrote:
>
> The audit_tree_group->mark_mutex is held all the time while we create
> the fsnotify mark, add it to the inode, and insert chunk into the hash.
> Hence mark cannot get detached during this time and so the check whether
> the mark is attached in insert_hash() is pointless.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  kernel/audit_tree.c | 2 --
>  1 file changed, 2 deletions(-)

This makes sense.

--
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/10] audit: Allocate fsnotify mark independently of chunk
  2018-07-10 10:02 ` [PATCH 09/10] audit: Allocate fsnotify mark independently of chunk Jan Kara
  2018-07-11  8:57   ` Amir Goldstein
@ 2018-07-27  4:47   ` Paul Moore
  2018-09-04 14:03     ` Jan Kara
  1 sibling, 1 reply; 32+ messages in thread
From: Paul Moore @ 2018-07-27  4:47 UTC (permalink / raw)
  To: jack; +Cc: linux-audit, linux-fsdevel, viro, rgb, amir73il

On Tue, Jul 10, 2018 at 6:02 AM Jan Kara <jack@suse.cz> wrote:
> Allocate fsnotify mark independently instead of embedding it inside
> chunk. This will allow us to just replace chunk attached to mark when
> growing / shrinking chunk instead of replacing mark attached to inode
> which is a more complex operation.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  kernel/audit_tree.c | 59 ++++++++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 45 insertions(+), 14 deletions(-)

...

> diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
> index bce3b04a365d..aec9b27a20ff 100644
> --- a/kernel/audit_tree.c
> +++ b/kernel/audit_tree.c
> @@ -38,6 +38,11 @@ struct audit_chunk {
>         } owners[];
>  };
>
> +struct audit_tree_mark {
> +       struct fsnotify_mark fsn_mark;
> +       struct audit_chunk *chunk;
> +};

It's probably okay to just call it "mark" considering we call
fsnotify_mark fields "mark" elsewhere.  If we are going to change it
in one spot we should probably change it other places as well for the
sake of readability.

2,10 +148,28 @@ static void audit_mark_put_chunk(struct audit_chunk *chunk)
>         call_rcu(&chunk->head, __put_chunk);
>  }
>
> +static inline struct audit_tree_mark *AUDIT_M(struct fsnotify_mark *entry)
> +{
> +       return container_of(entry, struct audit_tree_mark, fsn_mark);
> +}

When I see a symbol in all caps I think "macro definition", but this
isn't a macro definition.  I would suggest a different name, dropping
the caps, or converting it into a macro.

Also, unless I missed a call, it seems like after patch 10 all callers
of AUDIT_M end up getting the chunk field; maybe it is better if
AUDIT_M() ends up returning the audit_chunk pointer instead of the
audit_tree_mark pointer (and of course a name change if this is a
reasonable change)?

>  static void audit_tree_destroy_watch(struct fsnotify_mark *entry)
>  {
> -       struct audit_chunk *chunk = container_of(entry, struct audit_chunk, mark);
> +       struct audit_chunk *chunk = AUDIT_M(entry)->chunk;
>         audit_mark_put_chunk(chunk);
> +       kmem_cache_free(audit_tree_mark_cachep, entry);
> +}

I think you've said you've already fixed the above (thanks for the
review Amir!).

> +static struct fsnotify_mark *alloc_fsnotify_mark(void)
> +{
> +       struct audit_tree_mark *mark;
> +
> +       mark = kmem_cache_zalloc(audit_tree_mark_cachep, GFP_KERNEL);
> +       if (!mark)
> +               return NULL;
> +       fsnotify_init_mark(&mark->fsn_mark, audit_tree_group);
> +       mark->fsn_mark.mask = FS_IN_IGNORED;
> +       return &mark->fsn_mark;
>  }

Can't we just call it alloc_mark()?  Or did you create the function
earlier in the patchset and I'm missing it now?

[SIDE NOTE: I have some rather big disagreements with the current
naming in this file, but keeping things consistent seems to be a Good
Thing (once again, this is an existing problem not specific to your
patches).]

--
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 10/10] audit: Replace chunk attached to mark instead of replacing mark
  2018-07-10 10:02 ` [PATCH 10/10] audit: Replace chunk attached to mark instead of replacing mark Jan Kara
  2018-07-11 14:17   ` Amir Goldstein
@ 2018-07-27  4:47   ` Paul Moore
  2018-09-04 14:11     ` Jan Kara
  1 sibling, 1 reply; 32+ messages in thread
From: Paul Moore @ 2018-07-27  4:47 UTC (permalink / raw)
  To: jack; +Cc: linux-audit, linux-fsdevel, viro, rgb, amir73il

On Tue, Jul 10, 2018 at 6:02 AM Jan Kara <jack@suse.cz> wrote:
> Audit tree code currently associates new fsnotify mark with each new
> chunk. As chunk attached to an inode is replaced when new tag is added /
> removed, we also need to remove old fsnotify mark and add a new one on
> such occasion.  This is cumbersome and makes locking rules somewhat
> difficult to follow.
>
> Fix these problems by allocating fsnotify mark independently of chunk
> and keeping it all the time while there is some chunk attached to an
> inode. Also add documentation about the locking rules so that things are
> easier to follow.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  kernel/audit_tree.c | 163 +++++++++++++++++++++++++++-------------------------
>  1 file changed, 85 insertions(+), 78 deletions(-)

This is a really nice improvement, thanks!

> diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
> index aec9b27a20ff..40f61de77dd0 100644
> --- a/kernel/audit_tree.c
> +++ b/kernel/audit_tree.c
> @@ -272,6 +273,20 @@ static struct audit_chunk *find_chunk(struct node *p)
>         return container_of(p, struct audit_chunk, owners[0]);
>  }
>
> +static void replace_mark_chunk(struct fsnotify_mark *entry,
> +                              struct audit_chunk *chunk)
> +{
> +       struct audit_chunk *old;
> +
> +       assert_spin_locked(&hash_lock);
> +       old = AUDIT_M(entry)->chunk;
> +       AUDIT_M(entry)->chunk = chunk;
> +       if (chunk)
> +               chunk->mark = entry;
> +       if (old)
> +               old->mark = NULL;

Is it necessary that we check to see if chunk and old are non-NULL?
It seems like we would always want to set chunk->mark to entry and set
old->mark to NULL, yes?

> @@ -321,29 +341,31 @@ static void untag_chunk(struct node *p)
>
>         mutex_lock(&entry->group->mark_mutex);
>         /*
> -        * mark_mutex protects mark from getting detached and thus also from
> -        * mark->connector->obj getting NULL.
> +        * mark_mutex protects mark stabilizes chunk attached to the mark so we
> +        * can check whether it didn't change while we've dropped hash_lock.

I think your new text could use some revision, the "protects mark
stabilizes chunk" is odd.

>          */
> -       if (chunk->dead || !(entry->flags & FSNOTIFY_MARK_FLAG_ATTACHED)) {
> +       if (!(entry->flags & FSNOTIFY_MARK_FLAG_ATTACHED) ||
> +           AUDIT_M(entry)->chunk != chunk) {
>                 mutex_unlock(&entry->group->mark_mutex);
>                 if (new)
> -                       fsnotify_put_mark(new->mark);
> +                       kfree(new);

Since we are just calling kfree() now we can do away with the "if (new)" check.

--
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 01/10] audit_tree: Remove mark->lock locking
  2018-07-27  4:47   ` Paul Moore
@ 2018-09-04  9:53     ` Jan Kara
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Kara @ 2018-09-04  9:53 UTC (permalink / raw)
  To: Paul Moore; +Cc: jack, linux-audit, linux-fsdevel, viro, rgb, amir73il

Hi,

sorry for getting to this so late but I was catching up after vacation and
your replies got burried in my inbox.

On Fri 27-07-18 00:47:04, Paul Moore wrote:
> On Tue, Jul 10, 2018 at 6:02 AM Jan Kara <jack@suse.cz> wrote:
> > Currently, audit_tree code uses mark->lock to protect against detaching
> > of mark from an inode. In most places it however also uses
> > mark->group->mark_mutex (as we need to atomically replace attached
> > marks) and this provides protection against mark detaching as well. So
> > just remove protection with mark->lock from audit tree code and replace
> > it with mark->group->mark_mutex protection in all the places. It
> > simplifies the code and gets rid of some ugly catches like calling
> > fsnotify_add_mark_locked() with mark->lock held (which cannot sleep only
> > because we hold a reference to another mark attached to the same inode).
> >
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  kernel/audit_tree.c | 24 ++++--------------------
> >  1 file changed, 4 insertions(+), 20 deletions(-)
> 
> ...
> 
> > diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
> > index 02feef939560..1c82eb6674c4 100644
> > --- a/kernel/audit_tree.c
> > +++ b/kernel/audit_tree.c
> > @@ -360,12 +355,12 @@ static int create_chunk(struct inode *inode, struct audit_tree *tree)
> >                 return -ENOSPC;
> >         }
> >
> > -       spin_lock(&entry->lock);
> > +       mutex_lock(&entry->group->mark_mutex);
> 
> I wonder if we could move the lock up above the
> fsnotify_add_inode_mark() earlier in create_chunk() and use
> fsnotify_add_mark_locked()?

Possibly, but I didn't want to do this in this patch as that's a separate
change. Also this is what in fact happens in later patches.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/10] audit: Fix possible spurious -ENOSPC error
  2018-07-27  4:47   ` Paul Moore
@ 2018-09-04 10:00     ` Jan Kara
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Kara @ 2018-09-04 10:00 UTC (permalink / raw)
  To: Paul Moore; +Cc: jack, linux-audit, linux-fsdevel, viro, rgb, amir73il

On Fri 27-07-18 00:47:10, Paul Moore wrote:
> On Tue, Jul 10, 2018 at 6:02 AM Jan Kara <jack@suse.cz> wrote:
> > When an inode is tagged with a tree, tag_chunk() checks whether there is
> > audit_tree_group mark attached to the inode and adds one if not. However
> > nothing protects another tag_chunk() to add the mark between we've
> > checked and try to add the fsnotify mark thus resulting in an error from
> > fsnotify_add_mark() and consequently an ENOSPC error from tag_chunk().
> >
> > Fix the problem by holding mark_mutex over the whole check-insert code
> > sequence.
> >
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  kernel/audit_tree.c | 26 ++++++++++++++++----------
> >  1 file changed, 16 insertions(+), 10 deletions(-)
> 
> ...
> 
> > diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
> > index 1c82eb6674c4..de8d344d91b1 100644
> > --- a/kernel/audit_tree.c
> > +++ b/kernel/audit_tree.c
> > @@ -342,25 +342,29 @@ static void untag_chunk(struct node *p)
> >         spin_lock(&hash_lock);
> >  }
> >
> > +/* Call with group->mark_mutex held, releases it */
> 
> Stuff like that always makes me nervous.

Yes, I also prefer to avoid stuff like this.

> Could we defer releasing the mutex to the caller, after create_chunk()
> returns?  It looks like fsnotify_destroy_mark() allows a single level of
> nesting so it should be okay, yes?

This won't work. fsnotify_destroy_mark() would try to acquire the same
mutex and block indefinitely (the nesting depth is there just for lockdep
so that you can possibly nest mark_mutexes of two different group's). And
even if we do more work and use separate fsnotify_detach_mark() and
fsnotify_free_mark() calls instead of fsnotify_destroy_mark(), the problem
is still there as fsnotify_free_mark() must not be called under mark_mutex
(as it can acquire it in some cases).

So as much as I don't like functions that release locks they didn't take I
don't see how to avoid that here without creating even bigger mess.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/10] audit: Allocate fsnotify mark independently of chunk
  2018-07-27  4:47   ` Paul Moore
@ 2018-09-04 14:03     ` Jan Kara
  2018-09-04 14:07       ` Jan Kara
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Kara @ 2018-09-04 14:03 UTC (permalink / raw)
  To: Paul Moore; +Cc: jack, linux-audit, linux-fsdevel, viro, rgb, amir73il

On Fri 27-07-18 00:47:37, Paul Moore wrote:
> On Tue, Jul 10, 2018 at 6:02 AM Jan Kara <jack@suse.cz> wrote:
> > Allocate fsnotify mark independently instead of embedding it inside
> > chunk. This will allow us to just replace chunk attached to mark when
> > growing / shrinking chunk instead of replacing mark attached to inode
> > which is a more complex operation.
> >
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  kernel/audit_tree.c | 59 ++++++++++++++++++++++++++++++++++++++++-------------
> >  1 file changed, 45 insertions(+), 14 deletions(-)
> 
> ...
> 
> > diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
> > index bce3b04a365d..aec9b27a20ff 100644
> > --- a/kernel/audit_tree.c
> > +++ b/kernel/audit_tree.c
> > @@ -38,6 +38,11 @@ struct audit_chunk {
> >         } owners[];
> >  };
> >
> > +struct audit_tree_mark {
> > +       struct fsnotify_mark fsn_mark;
> > +       struct audit_chunk *chunk;
> > +};
> 
> It's probably okay to just call it "mark" considering we call
> fsnotify_mark fields "mark" elsewhere.  If we are going to change it
> in one spot we should probably change it other places as well for the
> sake of readability.

The current notation is that 'fsn_mark' (or sometimes 'entry') is struct
fsnotify_mark while plain 'mark' is struct audit_tree_mark (well, except
for audit_chunk AFAICS). So just replacing fsn_mark with mark is IMO going
to cause more confusion. But if you prefer different naming convention,
this is the right moment to bring some consistency into the whole thing.
So how do you prefer to differentiate between fsnotify_mark and
audit_tree_mark?

> 2,10 +148,28 @@ static void audit_mark_put_chunk(struct audit_chunk *chunk)
> >         call_rcu(&chunk->head, __put_chunk);
> >  }
> >
> > +static inline struct audit_tree_mark *AUDIT_M(struct fsnotify_mark *entry)
> > +{
> > +       return container_of(entry, struct audit_tree_mark, fsn_mark);
> > +}
> 
> When I see a symbol in all caps I think "macro definition", but this
> isn't a macro definition.  I would suggest a different name, dropping
> the caps, or converting it into a macro.

I want the inline function for type safety. All caps for this function is a
tradition from filesystem space where all FOO_I(inode) or FOO_SB(sb)
helpers are all caps. I then inherited it for fs/notify/. So it is
consistent with some code. But if you still don't like all caps, I can
change the name... Hmm, given your comment below, I guess I'll just change
the name since it will have exactly two call sites.

> Also, unless I missed a call, it seems like after patch 10 all callers
> of AUDIT_M end up getting the chunk field; maybe it is better if
> AUDIT_M() ends up returning the audit_chunk pointer instead of the
> audit_tree_mark pointer (and of course a name change if this is a
> reasonable change)?

Good point. All but one - audit_tree_destroy_watch(). I'll create a helper
for this.

> > +static struct fsnotify_mark *alloc_fsnotify_mark(void)
> > +{
> > +       struct audit_tree_mark *mark;
> > +
> > +       mark = kmem_cache_zalloc(audit_tree_mark_cachep, GFP_KERNEL);
> > +       if (!mark)
> > +               return NULL;
> > +       fsnotify_init_mark(&mark->fsn_mark, audit_tree_group);
> > +       mark->fsn_mark.mask = FS_IN_IGNORED;
> > +       return &mark->fsn_mark;
> >  }
> 
> Can't we just call it alloc_mark()?  Or did you create the function
> earlier in the patchset and I'm missing it now?

No, previously mark got allocated as a part of alloc_chunk() as it was
embedded there. OK, I can call this alloc_mark().

> [SIDE NOTE: I have some rather big disagreements with the current
> naming in this file, but keeping things consistent seems to be a Good
> Thing (once again, this is an existing problem not specific to your
> patches).]

I agree some of the names are tad bit confusing. But not that I'd have
great idea how to improve that.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/10] audit: Allocate fsnotify mark independently of chunk
  2018-09-04 14:03     ` Jan Kara
@ 2018-09-04 14:07       ` Jan Kara
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Kara @ 2018-09-04 14:07 UTC (permalink / raw)
  To: Paul Moore; +Cc: jack, linux-audit, linux-fsdevel, viro, rgb, amir73il

On Tue 04-09-18 16:03:07, Jan Kara wrote:
> On Fri 27-07-18 00:47:37, Paul Moore wrote:
> > On Tue, Jul 10, 2018 at 6:02 AM Jan Kara <jack@suse.cz> wrote:
> > > Allocate fsnotify mark independently instead of embedding it inside
> > > chunk. This will allow us to just replace chunk attached to mark when
> > > growing / shrinking chunk instead of replacing mark attached to inode
> > > which is a more complex operation.
> > >
> > > Signed-off-by: Jan Kara <jack@suse.cz>
> > > ---
> > >  kernel/audit_tree.c | 59 ++++++++++++++++++++++++++++++++++++++++-------------
> > >  1 file changed, 45 insertions(+), 14 deletions(-)
> > 
> > ...
> > 
> > > diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
> > > index bce3b04a365d..aec9b27a20ff 100644
> > > --- a/kernel/audit_tree.c
> > > +++ b/kernel/audit_tree.c
> > > @@ -38,6 +38,11 @@ struct audit_chunk {
> > >         } owners[];
> > >  };
> > >
> > > +struct audit_tree_mark {
> > > +       struct fsnotify_mark fsn_mark;
> > > +       struct audit_chunk *chunk;
> > > +};
> > 
> > It's probably okay to just call it "mark" considering we call
> > fsnotify_mark fields "mark" elsewhere.  If we are going to change it
> > in one spot we should probably change it other places as well for the
> > sake of readability.
> 
> The current notation is that 'fsn_mark' (or sometimes 'entry') is struct
> fsnotify_mark while plain 'mark' is struct audit_tree_mark (well, except
> for audit_chunk AFAICS). So just replacing fsn_mark with mark is IMO going
> to cause more confusion. But if you prefer different naming convention,
> this is the right moment to bring some consistency into the whole thing.
> So how do you prefer to differentiate between fsnotify_mark and
> audit_tree_mark?

After searching the code and given your observation that audit_tree_mark is
rarely directly used, I guess I'll just make fsn_mark -> mark, entry->mark
renaming and invent some name for the few places where we use
audit_tree_mark directly.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 10/10] audit: Replace chunk attached to mark instead of replacing mark
  2018-07-27  4:47   ` Paul Moore
@ 2018-09-04 14:11     ` Jan Kara
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Kara @ 2018-09-04 14:11 UTC (permalink / raw)
  To: Paul Moore; +Cc: jack, linux-audit, linux-fsdevel, viro, rgb, amir73il

On Fri 27-07-18 00:47:42, Paul Moore wrote:
> On Tue, Jul 10, 2018 at 6:02 AM Jan Kara <jack@suse.cz> wrote:
> > diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
> > index aec9b27a20ff..40f61de77dd0 100644
> > --- a/kernel/audit_tree.c
> > +++ b/kernel/audit_tree.c
> > @@ -272,6 +273,20 @@ static struct audit_chunk *find_chunk(struct node *p)
> >         return container_of(p, struct audit_chunk, owners[0]);
> >  }
> >
> > +static void replace_mark_chunk(struct fsnotify_mark *entry,
> > +                              struct audit_chunk *chunk)
> > +{
> > +       struct audit_chunk *old;
> > +
> > +       assert_spin_locked(&hash_lock);
> > +       old = AUDIT_M(entry)->chunk;
> > +       AUDIT_M(entry)->chunk = chunk;
> > +       if (chunk)
> > +               chunk->mark = entry;
> > +       if (old)
> > +               old->mark = NULL;
> 
> Is it necessary that we check to see if chunk and old are non-NULL?
> It seems like we would always want to set chunk->mark to entry and set
> old->mark to NULL, yes?

Both checks are needed - 'old' can be NULL if we use replace_mark_chunk()
to attach first chunk to mark. 'chunk' can be NULL if we use
replace_mark_chunk() to detach mark from current chunk when destroying it.

> > @@ -321,29 +341,31 @@ static void untag_chunk(struct node *p)
> >
> >         mutex_lock(&entry->group->mark_mutex);
> >         /*
> > -        * mark_mutex protects mark from getting detached and thus also from
> > -        * mark->connector->obj getting NULL.
> > +        * mark_mutex protects mark stabilizes chunk attached to the mark so we
> > +        * can check whether it didn't change while we've dropped hash_lock.
> 
> I think your new text could use some revision, the "protects mark
> stabilizes chunk" is odd.

Yup, I'll fix that.

> >          */
> > -       if (chunk->dead || !(entry->flags & FSNOTIFY_MARK_FLAG_ATTACHED)) {
> > +       if (!(entry->flags & FSNOTIFY_MARK_FLAG_ATTACHED) ||
> > +           AUDIT_M(entry)->chunk != chunk) {
> >                 mutex_unlock(&entry->group->mark_mutex);
> >                 if (new)
> > -                       fsnotify_put_mark(new->mark);
> > +                       kfree(new);
> 
> Since we are just calling kfree() now we can do away with the "if (new)"
> check.

Right, I'll do that.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2018-09-04 18:36 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-10 10:02 [PATCH 0/10 v2] audit: Fix various races when tagging and untagging mounts Jan Kara
2018-07-10 10:02 ` [PATCH 01/10] audit_tree: Remove mark->lock locking Jan Kara
2018-07-27  4:47   ` Paul Moore
2018-09-04  9:53     ` Jan Kara
2018-07-10 10:02 ` [PATCH 02/10] audit: Fix possible spurious -ENOSPC error Jan Kara
2018-07-27  4:47   ` Paul Moore
2018-09-04 10:00     ` Jan Kara
2018-07-10 10:02 ` [PATCH 03/10] audit: Fix possible tagging failures Jan Kara
2018-07-10 10:02 ` [PATCH 04/10] audit: Embed key into chunk Jan Kara
2018-07-27  4:47   ` Paul Moore
2018-07-10 10:02 ` [PATCH 05/10] audit: Make hash table insertion safe against concurrent lookups Jan Kara
2018-07-10 10:02 ` [PATCH 06/10] audit: Factor out chunk replacement code Jan Kara
2018-07-11  7:58   ` Amir Goldstein
2018-07-11  8:26     ` Jan Kara
2018-07-11  9:01       ` Amir Goldstein
2018-07-11  9:23         ` Jan Kara
2018-07-27  4:47   ` Paul Moore
2018-07-10 10:02 ` [PATCH 07/10] audit: Remove pointless check in insert_hash() Jan Kara
2018-07-27  4:47   ` Paul Moore
2018-07-10 10:02 ` [PATCH 08/10] audit: Provide helper for dropping mark's chunk reference Jan Kara
2018-07-10 10:02 ` [PATCH 09/10] audit: Allocate fsnotify mark independently of chunk Jan Kara
2018-07-11  8:57   ` Amir Goldstein
2018-07-11 10:48     ` Amir Goldstein
2018-07-16 15:13       ` Jan Kara
2018-07-27  4:47   ` Paul Moore
2018-09-04 14:03     ` Jan Kara
2018-09-04 14:07       ` Jan Kara
2018-07-10 10:02 ` [PATCH 10/10] audit: Replace chunk attached to mark instead of replacing mark Jan Kara
2018-07-11 14:17   ` Amir Goldstein
2018-07-27  4:47   ` Paul Moore
2018-09-04 14:11     ` Jan Kara
2018-07-10 10:02 ` [PATCH 11/10 TESTSUITE] audit_testsuite: Add stress test for tree watches Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).