All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] assoc_array: Fix BUG_ON during garbage collect
@ 2022-05-11 22:55 Stephen Brennan
  2022-05-11 23:15 ` Stephen Brennan
  2022-05-12 21:49 ` Stephen Brennan
  0 siblings, 2 replies; 12+ messages in thread
From: Stephen Brennan @ 2022-05-11 22:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, David Howells, Len Baker, Gustavo A. R. Silva,
	Stephen Brennan

A rare BUG_ON triggered in assoc_array_gc:

    [3430308.818153] kernel BUG at lib/assoc_array.c:1609!

Which corresponded to the statement currently at line 1593 upstream:

    BUG_ON(assoc_array_ptr_is_meta(p));

Using the data from the core dump, I was able to generate a userspace
reproducer[1] and determine the cause of the bug.

[1]: https://github.com/brenns10/kernel_stuff/tree/master/assoc_array_gc

After running the iterator on the entire branch, an internal tree node
looked like the following:

    NODE (nr_leaves_on_branch: 3)
      SLOT [0] NODE (2 leaves)
      SLOT [1] NODE (1 leaf)
      SLOT [2..f] NODE (empty)

In the userspace reproducer, the pr_devel output when compressing this
node was:

    -- compress node 0x5607cc089380 --
    free=0, leaves=0
    [0] retain node 2/1 [nx 0]
    [1] fold node 1/1 [nx 0]
    [2] fold node 0/1 [nx 2]
    [3] fold node 0/2 [nx 2]
    [4] fold node 0/3 [nx 2]
    [5] fold node 0/4 [nx 2]
    [6] fold node 0/5 [nx 2]
    [7] fold node 0/6 [nx 2]
    [8] fold node 0/7 [nx 2]
    [9] fold node 0/8 [nx 2]
    [10] fold node 0/9 [nx 2]
    [11] fold node 0/10 [nx 2]
    [12] fold node 0/11 [nx 2]
    [13] fold node 0/12 [nx 2]
    [14] fold node 0/13 [nx 2]
    [15] fold node 0/14 [nx 2]
    after: 3

At slot 0, an internal node with 2 leaves could not be folded into the
node, because there was only one available slot (slot 0). Thus, the
internal node was retained. At slot 1, the node had one leaf, and was
able to be folded in successfully. The remaining nodes had no leaves,
and so were removed. By the end of the compression stage, there were 14
free slots, and only 3 leaf nodes. The tree was ascended and then its
parent node was compressed. When this node was seen, it could not be
folded, due to the internal node it contained.

The invariant for compression in this function is: whenever
nr_leaves_on_branch < ASSOC_ARRAY_FAN_OUT, the node should contain all
leaf nodes. The compression step currently cannot guarantee this, given
the corner case shown above.

To fix this issue, retry compression whenever we have retained a node,
and yet nr_leaves_on_branch < ASSOC_ARRAY_FAN_OUT. This second
compression will then allow the node in slot 1 to be folded in,
satisfying the invariant. Below is the output of the reproducer once the
fix is applied:

    -- compress node 0x560e9c562380 --
    free=0, leaves=0
    [0] retain node 2/1 [nx 0]
    [1] fold node 1/1 [nx 0]
    [2] fold node 0/1 [nx 2]
    [3] fold node 0/2 [nx 2]
    [4] fold node 0/3 [nx 2]
    [5] fold node 0/4 [nx 2]
    [6] fold node 0/5 [nx 2]
    [7] fold node 0/6 [nx 2]
    [8] fold node 0/7 [nx 2]
    [9] fold node 0/8 [nx 2]
    [10] fold node 0/9 [nx 2]
    [11] fold node 0/10 [nx 2]
    [12] fold node 0/11 [nx 2]
    [13] fold node 0/12 [nx 2]
    [14] fold node 0/13 [nx 2]
    [15] fold node 0/14 [nx 2]
    internal nodes remain despite enough space, retrying
    -- compress node 0x560e9c562380 --
    free=14, leaves=1
    [0] fold node 2/15 [nx 0]
    after: 3

Fixes: 3cb989501c26 ("Add a generic associative array implementation.")
Cc: <stable@vger.kernel.org>
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---

Hi Andrew,

As far as I can tell, lib/assoc_array.c has no maintainer, so I'm
sending this bugfix to you. Hopefully David can take a look at this and
verify sure it's all sane. I tested it on my userspace reproducer, and
also by booting it and exercising the keyring_gc functions a bit.

Thanks,
Stephen

 lib/assoc_array.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/lib/assoc_array.c b/lib/assoc_array.c
index 079c72e26493..7ed20233a770 100644
--- a/lib/assoc_array.c
+++ b/lib/assoc_array.c
@@ -1462,6 +1462,7 @@ int assoc_array_gc(struct assoc_array *array,
 	struct assoc_array_ptr *new_root, *new_parent, **new_ptr_pp;
 	unsigned long nr_leaves_on_tree;
 	int keylen, slot, nr_free, next_slot, i;
+	bool retained;
 
 	pr_devel("-->%s()\n", __func__);
 
@@ -1536,6 +1537,7 @@ int assoc_array_gc(struct assoc_array *array,
 		goto descend;
 	}
 
+retry_compress:
 	pr_devel("-- compress node %p --\n", new_n);
 
 	/* Count up the number of empty slots in this node and work out the
@@ -1554,6 +1556,7 @@ int assoc_array_gc(struct assoc_array *array,
 
 	/* See what we can fold in */
 	next_slot = 0;
+	retained = 0;
 	for (slot = 0; slot < ASSOC_ARRAY_FAN_OUT; slot++) {
 		struct assoc_array_shortcut *s;
 		struct assoc_array_node *child;
@@ -1602,9 +1605,14 @@ int assoc_array_gc(struct assoc_array *array,
 			pr_devel("[%d] retain node %lu/%d [nx %d]\n",
 				 slot, child->nr_leaves_on_branch, nr_free + 1,
 				 next_slot);
+			retained = true;
 		}
 	}
 
+	if (retained && new_n->nr_leaves_on_branch < ASSOC_ARRAY_FAN_OUT) {
+		pr_devel("internal nodes remain despite neough space, retrying\n");
+		goto retry_compress;
+	}
 	pr_devel("after: %lu\n", new_n->nr_leaves_on_branch);
 
 	nr_leaves_on_tree = new_n->nr_leaves_on_branch;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] assoc_array: Fix BUG_ON during garbage collect
  2022-05-11 22:55 [PATCH] assoc_array: Fix BUG_ON during garbage collect Stephen Brennan
@ 2022-05-11 23:15 ` Stephen Brennan
  2022-05-12 21:49 ` Stephen Brennan
  1 sibling, 0 replies; 12+ messages in thread
From: Stephen Brennan @ 2022-05-11 23:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, David Howells, Len Baker, Gustavo A. R. Silva, keyrings

Stephen Brennan <stephen.s.brennan@oracle.com> writes:
> A rare BUG_ON triggered in assoc_array_gc:
>
>     [3430308.818153] kernel BUG at lib/assoc_array.c:1609!
>
> Which corresponded to the statement currently at line 1593 upstream:
>
>     BUG_ON(assoc_array_ptr_is_meta(p));
>
> Using the data from the core dump, I was able to generate a userspace
> reproducer[1] and determine the cause of the bug.
>
> [1]: https://github.com/brenns10/kernel_stuff/tree/master/assoc_array_gc
>
> After running the iterator on the entire branch, an internal tree node
> looked like the following:
>
>     NODE (nr_leaves_on_branch: 3)
>       SLOT [0] NODE (2 leaves)
>       SLOT [1] NODE (1 leaf)
>       SLOT [2..f] NODE (empty)
>
> In the userspace reproducer, the pr_devel output when compressing this
> node was:
>
>     -- compress node 0x5607cc089380 --
>     free=0, leaves=0
>     [0] retain node 2/1 [nx 0]
>     [1] fold node 1/1 [nx 0]
>     [2] fold node 0/1 [nx 2]
>     [3] fold node 0/2 [nx 2]
>     [4] fold node 0/3 [nx 2]
>     [5] fold node 0/4 [nx 2]
>     [6] fold node 0/5 [nx 2]
>     [7] fold node 0/6 [nx 2]
>     [8] fold node 0/7 [nx 2]
>     [9] fold node 0/8 [nx 2]
>     [10] fold node 0/9 [nx 2]
>     [11] fold node 0/10 [nx 2]
>     [12] fold node 0/11 [nx 2]
>     [13] fold node 0/12 [nx 2]
>     [14] fold node 0/13 [nx 2]
>     [15] fold node 0/14 [nx 2]
>     after: 3
>
> At slot 0, an internal node with 2 leaves could not be folded into the
> node, because there was only one available slot (slot 0). Thus, the
> internal node was retained. At slot 1, the node had one leaf, and was
> able to be folded in successfully. The remaining nodes had no leaves,
> and so were removed. By the end of the compression stage, there were 14
> free slots, and only 3 leaf nodes. The tree was ascended and then its
> parent node was compressed. When this node was seen, it could not be
> folded, due to the internal node it contained.
>
> The invariant for compression in this function is: whenever
> nr_leaves_on_branch < ASSOC_ARRAY_FAN_OUT, the node should contain all
> leaf nodes. The compression step currently cannot guarantee this, given
> the corner case shown above.
>
> To fix this issue, retry compression whenever we have retained a node,
> and yet nr_leaves_on_branch < ASSOC_ARRAY_FAN_OUT. This second
> compression will then allow the node in slot 1 to be folded in,
> satisfying the invariant. Below is the output of the reproducer once the
> fix is applied:
>
>     -- compress node 0x560e9c562380 --
>     free=0, leaves=0
>     [0] retain node 2/1 [nx 0]
>     [1] fold node 1/1 [nx 0]
>     [2] fold node 0/1 [nx 2]
>     [3] fold node 0/2 [nx 2]
>     [4] fold node 0/3 [nx 2]
>     [5] fold node 0/4 [nx 2]
>     [6] fold node 0/5 [nx 2]
>     [7] fold node 0/6 [nx 2]
>     [8] fold node 0/7 [nx 2]
>     [9] fold node 0/8 [nx 2]
>     [10] fold node 0/9 [nx 2]
>     [11] fold node 0/10 [nx 2]
>     [12] fold node 0/11 [nx 2]
>     [13] fold node 0/12 [nx 2]
>     [14] fold node 0/13 [nx 2]
>     [15] fold node 0/14 [nx 2]
>     internal nodes remain despite enough space, retrying
>     -- compress node 0x560e9c562380 --
>     free=14, leaves=1
>     [0] fold node 2/15 [nx 0]
>     after: 3
>
> Fixes: 3cb989501c26 ("Add a generic associative array implementation.")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
> ---
>
> Hi Andrew,
>
> As far as I can tell, lib/assoc_array.c has no maintainer, so I'm
> sending this bugfix to you. Hopefully David can take a look at this and
> verify sure it's all sane. I tested it on my userspace reproducer, and
> also by booting it and exercising the keyring_gc functions a bit.

Actually, by my searching, lib/assoc_array.c is only used by the
KEYS/KEYRINGS subsystem, which is co-maintained by David, who also wrote
the array. Maybe this file ought to be added to that subsystem in
MAINTAINERS?

Adding keyrings@vger.kernel.org to this thread.

Stephen

>
> Thanks,
> Stephen
>
>  lib/assoc_array.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/lib/assoc_array.c b/lib/assoc_array.c
> index 079c72e26493..7ed20233a770 100644
> --- a/lib/assoc_array.c
> +++ b/lib/assoc_array.c
> @@ -1462,6 +1462,7 @@ int assoc_array_gc(struct assoc_array *array,
>  	struct assoc_array_ptr *new_root, *new_parent, **new_ptr_pp;
>  	unsigned long nr_leaves_on_tree;
>  	int keylen, slot, nr_free, next_slot, i;
> +	bool retained;
>  
>  	pr_devel("-->%s()\n", __func__);
>  
> @@ -1536,6 +1537,7 @@ int assoc_array_gc(struct assoc_array *array,
>  		goto descend;
>  	}
>  
> +retry_compress:
>  	pr_devel("-- compress node %p --\n", new_n);
>  
>  	/* Count up the number of empty slots in this node and work out the
> @@ -1554,6 +1556,7 @@ int assoc_array_gc(struct assoc_array *array,
>  
>  	/* See what we can fold in */
>  	next_slot = 0;
> +	retained = 0;
>  	for (slot = 0; slot < ASSOC_ARRAY_FAN_OUT; slot++) {
>  		struct assoc_array_shortcut *s;
>  		struct assoc_array_node *child;
> @@ -1602,9 +1605,14 @@ int assoc_array_gc(struct assoc_array *array,
>  			pr_devel("[%d] retain node %lu/%d [nx %d]\n",
>  				 slot, child->nr_leaves_on_branch, nr_free + 1,
>  				 next_slot);
> +			retained = true;
>  		}
>  	}
>  
> +	if (retained && new_n->nr_leaves_on_branch < ASSOC_ARRAY_FAN_OUT) {
> +		pr_devel("internal nodes remain despite neough space, retrying\n");
> +		goto retry_compress;
> +	}
>  	pr_devel("after: %lu\n", new_n->nr_leaves_on_branch);
>  
>  	nr_leaves_on_tree = new_n->nr_leaves_on_branch;
> -- 
> 2.30.2

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] assoc_array: Fix BUG_ON during garbage collect
  2022-05-11 22:55 [PATCH] assoc_array: Fix BUG_ON during garbage collect Stephen Brennan
  2022-05-11 23:15 ` Stephen Brennan
@ 2022-05-12 21:49 ` Stephen Brennan
  2022-05-12 21:50   ` [PATCH v2] " Stephen Brennan
  1 sibling, 1 reply; 12+ messages in thread
From: Stephen Brennan @ 2022-05-12 21:49 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, David Howells, Len Baker, Gustavo A. R. Silva

Stephen Brennan <stephen.s.brennan@oracle.com> writes:
> A rare BUG_ON triggered in assoc_array_gc:
>
>     [3430308.818153] kernel BUG at lib/assoc_array.c:1609!
>
> Which corresponded to the statement currently at line 1593 upstream:
>
>     BUG_ON(assoc_array_ptr_is_meta(p));
>
> Using the data from the core dump, I was able to generate a userspace
> reproducer[1] and determine the cause of the bug.
>
> [1]: https://github.com/brenns10/kernel_stuff/tree/master/assoc_array_gc
>
> After running the iterator on the entire branch, an internal tree node
> looked like the following:
>
>     NODE (nr_leaves_on_branch: 3)
>       SLOT [0] NODE (2 leaves)
>       SLOT [1] NODE (1 leaf)
>       SLOT [2..f] NODE (empty)
>
> In the userspace reproducer, the pr_devel output when compressing this
> node was:
>
>     -- compress node 0x5607cc089380 --
>     free=0, leaves=0
>     [0] retain node 2/1 [nx 0]
>     [1] fold node 1/1 [nx 0]
>     [2] fold node 0/1 [nx 2]
>     [3] fold node 0/2 [nx 2]
>     [4] fold node 0/3 [nx 2]
>     [5] fold node 0/4 [nx 2]
>     [6] fold node 0/5 [nx 2]
>     [7] fold node 0/6 [nx 2]
>     [8] fold node 0/7 [nx 2]
>     [9] fold node 0/8 [nx 2]
>     [10] fold node 0/9 [nx 2]
>     [11] fold node 0/10 [nx 2]
>     [12] fold node 0/11 [nx 2]
>     [13] fold node 0/12 [nx 2]
>     [14] fold node 0/13 [nx 2]
>     [15] fold node 0/14 [nx 2]
>     after: 3
>
> At slot 0, an internal node with 2 leaves could not be folded into the
> node, because there was only one available slot (slot 0). Thus, the
> internal node was retained. At slot 1, the node had one leaf, and was
> able to be folded in successfully. The remaining nodes had no leaves,
> and so were removed. By the end of the compression stage, there were 14
> free slots, and only 3 leaf nodes. The tree was ascended and then its
> parent node was compressed. When this node was seen, it could not be
> folded, due to the internal node it contained.
>
> The invariant for compression in this function is: whenever
> nr_leaves_on_branch < ASSOC_ARRAY_FAN_OUT, the node should contain all
> leaf nodes. The compression step currently cannot guarantee this, given
> the corner case shown above.
>
> To fix this issue, retry compression whenever we have retained a node,
> and yet nr_leaves_on_branch < ASSOC_ARRAY_FAN_OUT. This second
> compression will then allow the node in slot 1 to be folded in,
> satisfying the invariant. Below is the output of the reproducer once the
> fix is applied:
>
>     -- compress node 0x560e9c562380 --
>     free=0, leaves=0
>     [0] retain node 2/1 [nx 0]
>     [1] fold node 1/1 [nx 0]
>     [2] fold node 0/1 [nx 2]
>     [3] fold node 0/2 [nx 2]
>     [4] fold node 0/3 [nx 2]
>     [5] fold node 0/4 [nx 2]
>     [6] fold node 0/5 [nx 2]
>     [7] fold node 0/6 [nx 2]
>     [8] fold node 0/7 [nx 2]
>     [9] fold node 0/8 [nx 2]
>     [10] fold node 0/9 [nx 2]
>     [11] fold node 0/10 [nx 2]
>     [12] fold node 0/11 [nx 2]
>     [13] fold node 0/12 [nx 2]
>     [14] fold node 0/13 [nx 2]
>     [15] fold node 0/14 [nx 2]
>     internal nodes remain despite enough space, retrying
>     -- compress node 0x560e9c562380 --
>     free=14, leaves=1
>     [0] fold node 2/15 [nx 0]
>     after: 3
>
> Fixes: 3cb989501c26 ("Add a generic associative array implementation.")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
> ---
>
> Hi Andrew,
>
> As far as I can tell, lib/assoc_array.c has no maintainer, so I'm
> sending this bugfix to you. Hopefully David can take a look at this and
> verify sure it's all sane. I tested it on my userspace reproducer, and
> also by booting it and exercising the keyring_gc functions a bit.
>
> Thanks,
> Stephen
>
>  lib/assoc_array.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/lib/assoc_array.c b/lib/assoc_array.c
> index 079c72e26493..7ed20233a770 100644
> --- a/lib/assoc_array.c
> +++ b/lib/assoc_array.c
> @@ -1462,6 +1462,7 @@ int assoc_array_gc(struct assoc_array *array,
>  	struct assoc_array_ptr *new_root, *new_parent, **new_ptr_pp;
>  	unsigned long nr_leaves_on_tree;
>  	int keylen, slot, nr_free, next_slot, i;
> +	bool retained;
>  
>  	pr_devel("-->%s()\n", __func__);
>  
> @@ -1536,6 +1537,7 @@ int assoc_array_gc(struct assoc_array *array,
>  		goto descend;
>  	}
>  
> +retry_compress:
>  	pr_devel("-- compress node %p --\n", new_n);
>  
>  	/* Count up the number of empty slots in this node and work out the
> @@ -1554,6 +1556,7 @@ int assoc_array_gc(struct assoc_array *array,
>  
>  	/* See what we can fold in */
>  	next_slot = 0;
> +	retained = 0;
>  	for (slot = 0; slot < ASSOC_ARRAY_FAN_OUT; slot++) {
>  		struct assoc_array_shortcut *s;
>  		struct assoc_array_node *child;
> @@ -1602,9 +1605,14 @@ int assoc_array_gc(struct assoc_array *array,
>  			pr_devel("[%d] retain node %lu/%d [nx %d]\n",
>  				 slot, child->nr_leaves_on_branch, nr_free + 1,
>  				 next_slot);
> +			retained = true;
>  		}
>  	}
>  
> +	if (retained && new_n->nr_leaves_on_branch < ASSOC_ARRAY_FAN_OUT) {

Hi Andrew,

I'm sorry to ask this of you - but would you be willing to yank this
patch out and replace it with the v2 I'll send in reply to this? The
above test should be a "<=" condition, not a "<". And of course you
caught my spelling mistake on the line below. Both will be corrected in
the v2.

thanks,
Stephen

> +		pr_devel("internal nodes remain despite neough space, retrying\n");
> +		goto retry_compress;
> +	}
>  	pr_devel("after: %lu\n", new_n->nr_leaves_on_branch);
>  
>  	nr_leaves_on_tree = new_n->nr_leaves_on_branch;
> -- 
> 2.30.2

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2] assoc_array: Fix BUG_ON during garbage collect
  2022-05-12 21:49 ` Stephen Brennan
@ 2022-05-12 21:50   ` Stephen Brennan
  0 siblings, 0 replies; 12+ messages in thread
From: Stephen Brennan @ 2022-05-12 21:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, David Howells, Len Baker, Gustavo A. R. Silva,
	Stephen Brennan

A rare BUG_ON triggered in assoc_array_gc:

    [3430308.818153] kernel BUG at lib/assoc_array.c:1609!

Which corresponded to the statement currently at line 1593 upstream:

    BUG_ON(assoc_array_ptr_is_meta(p));

Using the data from the core dump, I was able to generate a userspace
reproducer[1] and determine the cause of the bug.

[1]: https://github.com/brenns10/kernel_stuff/tree/master/assoc_array_gc

After running the iterator on the entire branch, an internal tree node
looked like the following:

    NODE (nr_leaves_on_branch: 3)
      SLOT [0] NODE (2 leaves)
      SLOT [1] NODE (1 leaf)
      SLOT [2..f] NODE (empty)

In the userspace reproducer, the pr_devel output when compressing this
node was:

    -- compress node 0x5607cc089380 --
    free=0, leaves=0
    [0] retain node 2/1 [nx 0]
    [1] fold node 1/1 [nx 0]
    [2] fold node 0/1 [nx 2]
    [3] fold node 0/2 [nx 2]
    [4] fold node 0/3 [nx 2]
    [5] fold node 0/4 [nx 2]
    [6] fold node 0/5 [nx 2]
    [7] fold node 0/6 [nx 2]
    [8] fold node 0/7 [nx 2]
    [9] fold node 0/8 [nx 2]
    [10] fold node 0/9 [nx 2]
    [11] fold node 0/10 [nx 2]
    [12] fold node 0/11 [nx 2]
    [13] fold node 0/12 [nx 2]
    [14] fold node 0/13 [nx 2]
    [15] fold node 0/14 [nx 2]
    after: 3

At slot 0, an internal node with 2 leaves could not be folded into the
node, because there was only one available slot (slot 0). Thus, the
internal node was retained. At slot 1, the node had one leaf, and was
able to be folded in successfully. The remaining nodes had no leaves,
and so were removed. By the end of the compression stage, there were 14
free slots, and only 3 leaf nodes. The tree was ascended and then its
parent node was compressed. When this node was seen, it could not be
folded, due to the internal node it contained.

The invariant for compression in this function is: whenever
nr_leaves_on_branch < ASSOC_ARRAY_FAN_OUT, the node should contain all
leaf nodes. The compression step currently cannot guarantee this, given
the corner case shown above.

To fix this issue, retry compression whenever we have retained a node,
and yet nr_leaves_on_branch < ASSOC_ARRAY_FAN_OUT. This second
compression will then allow the node in slot 1 to be folded in,
satisfying the invariant. Below is the output of the reproducer once the
fix is applied:

    -- compress node 0x560e9c562380 --
    free=0, leaves=0
    [0] retain node 2/1 [nx 0]
    [1] fold node 1/1 [nx 0]
    [2] fold node 0/1 [nx 2]
    [3] fold node 0/2 [nx 2]
    [4] fold node 0/3 [nx 2]
    [5] fold node 0/4 [nx 2]
    [6] fold node 0/5 [nx 2]
    [7] fold node 0/6 [nx 2]
    [8] fold node 0/7 [nx 2]
    [9] fold node 0/8 [nx 2]
    [10] fold node 0/9 [nx 2]
    [11] fold node 0/10 [nx 2]
    [12] fold node 0/11 [nx 2]
    [13] fold node 0/12 [nx 2]
    [14] fold node 0/13 [nx 2]
    [15] fold node 0/14 [nx 2]
    internal nodes remain despite enough space, retrying
    -- compress node 0x560e9c562380 --
    free=14, leaves=1
    [0] fold node 2/15 [nx 0]
    after: 3

Fixes: 3cb989501c26 ("Add a generic associative array implementation.")
Cc: <stable@vger.kernel.org>
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---

v2: fix typo in pr_devel, correct comparison to "<="

 lib/assoc_array.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/lib/assoc_array.c b/lib/assoc_array.c
index 079c72e26493..4c4967b13b2d 100644
--- a/lib/assoc_array.c
+++ b/lib/assoc_array.c
@@ -1462,6 +1462,7 @@ int assoc_array_gc(struct assoc_array *array,
 	struct assoc_array_ptr *new_root, *new_parent, **new_ptr_pp;
 	unsigned long nr_leaves_on_tree;
 	int keylen, slot, nr_free, next_slot, i;
+	bool retained;
 
 	pr_devel("-->%s()\n", __func__);
 
@@ -1536,6 +1537,7 @@ int assoc_array_gc(struct assoc_array *array,
 		goto descend;
 	}
 
+retry_compress:
 	pr_devel("-- compress node %p --\n", new_n);
 
 	/* Count up the number of empty slots in this node and work out the
@@ -1554,6 +1556,7 @@ int assoc_array_gc(struct assoc_array *array,
 
 	/* See what we can fold in */
 	next_slot = 0;
+	retained = 0;
 	for (slot = 0; slot < ASSOC_ARRAY_FAN_OUT; slot++) {
 		struct assoc_array_shortcut *s;
 		struct assoc_array_node *child;
@@ -1602,9 +1605,14 @@ int assoc_array_gc(struct assoc_array *array,
 			pr_devel("[%d] retain node %lu/%d [nx %d]\n",
 				 slot, child->nr_leaves_on_branch, nr_free + 1,
 				 next_slot);
+			retained = true;
 		}
 	}
 
+	if (retained && new_n->nr_leaves_on_branch <= ASSOC_ARRAY_FAN_OUT) {
+		pr_devel("internal nodes remain despite enough space, retrying\n");
+		goto retry_compress;
+	}
 	pr_devel("after: %lu\n", new_n->nr_leaves_on_branch);
 
 	nr_leaves_on_tree = new_n->nr_leaves_on_branch;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] assoc_array: Fix BUG_ON during garbage collect
  2022-06-02  1:31   ` Linus Torvalds
@ 2022-06-02 17:05     ` Stephen Brennan
  0 siblings, 0 replies; 12+ messages in thread
From: Stephen Brennan @ 2022-06-02 17:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Howells, stable, Jarkko Sakkinen, Andrew Morton, keyrings,
	Linux Kernel Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Wed, Jun 1, 2022 at 3:00 PM Stephen Brennan
> <stephen.s.brennan@oracle.com> wrote:
>>
>> Just wanted to check on this patch as the 5.19 window closes. David, are
>> you planning on taking this through a particular tree, or is the ask for
>> Linus to pick it directly?
>
> Ok, picked up directly.
>
> These fall through the cracks partly because it's not obvious what
> they are for. Sometimes I get pull requests from DavidH, and sometimes
> I get random patches, and while the pull requests are fairly
> unambiguous ("please pull") the same is not necessarily true of the
> patches. Are they for discussion, an RFC, or fro applying...

I understand the confusion, thanks for taking this up!

Stephen

>
> So then I pretty much guess.
>
>                  Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] assoc_array: Fix BUG_ON during garbage collect
  2022-06-01 21:59 ` Stephen Brennan
@ 2022-06-02  1:31   ` Linus Torvalds
  2022-06-02 17:05     ` Stephen Brennan
  0 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2022-06-02  1:31 UTC (permalink / raw)
  To: Stephen Brennan
  Cc: David Howells, stable, Jarkko Sakkinen, Andrew Morton, keyrings,
	Linux Kernel Mailing List

On Wed, Jun 1, 2022 at 3:00 PM Stephen Brennan
<stephen.s.brennan@oracle.com> wrote:
>
> Just wanted to check on this patch as the 5.19 window closes. David, are
> you planning on taking this through a particular tree, or is the ask for
> Linus to pick it directly?

Ok, picked up directly.

These fall through the cracks partly because it's not obvious what
they are for. Sometimes I get pull requests from DavidH, and sometimes
I get random patches, and while the pull requests are fairly
unambiguous ("please pull") the same is not necessarily true of the
patches. Are they for discussion, an RFC, or fro applying...

So then I pretty much guess.

                 Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] assoc_array: Fix BUG_ON during garbage collect
  2022-06-01 22:41 ` Eric Biggers
@ 2022-06-01 23:03   ` Stephen Brennan
  0 siblings, 0 replies; 12+ messages in thread
From: Stephen Brennan @ 2022-06-01 23:03 UTC (permalink / raw)
  To: Eric Biggers, David Howells
  Cc: torvalds, stable, Jarkko Sakkinen, Andrew Morton, keyrings, linux-kernel

Eric Biggers <ebiggers@kernel.org> writes:

> On Thu, May 19, 2022 at 09:50:30AM +0100, David Howells wrote:
>> diff --git a/lib/assoc_array.c b/lib/assoc_array.c
>> index 079c72e26493..ca0b4f360c1a 100644
>> --- a/lib/assoc_array.c
>> +++ b/lib/assoc_array.c
>
> Where are the tests for this file?

As of today there are none:

$ grep -lr assoc_array
lib/assoc_array.c
lib/Makefile
lib/Kconfig
arch/powerpc/mm/numa.c
arch/powerpc/platforms/pseries/hotplug-memory.c
Documentation/core-api/index.rst
Documentation/core-api/assoc_array.rst
Documentation/translations/zh_CN/core-api/index.rst
Documentation/translations/zh_CN/core-api/assoc_array.rst
include/linux/assoc_array.h
include/linux/key.h
include/linux/assoc_array_priv.h
include/keys/keyring-type.h
security/keys/internal.h
security/keys/key.c
security/keys/keyring.c
security/keys/request_key.c

The assoc_array code is easy to get up and running in userspace (see the
reproducer for this bug). So testing it would be feasible with some sort
of userspace runner (KUnit?). Seems it hasn't been done yet.

Stephen

>
> - Eric

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] assoc_array: Fix BUG_ON during garbage collect
  2022-05-19  8:50 [PATCH] " David Howells
                   ` (2 preceding siblings ...)
  2022-06-01 21:59 ` Stephen Brennan
@ 2022-06-01 22:41 ` Eric Biggers
  2022-06-01 23:03   ` Stephen Brennan
  3 siblings, 1 reply; 12+ messages in thread
From: Eric Biggers @ 2022-06-01 22:41 UTC (permalink / raw)
  To: David Howells
  Cc: torvalds, stable, Stephen Brennan, Jarkko Sakkinen,
	Andrew Morton, keyrings, linux-kernel

On Thu, May 19, 2022 at 09:50:30AM +0100, David Howells wrote:
> diff --git a/lib/assoc_array.c b/lib/assoc_array.c
> index 079c72e26493..ca0b4f360c1a 100644
> --- a/lib/assoc_array.c
> +++ b/lib/assoc_array.c

Where are the tests for this file?

- Eric

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] assoc_array: Fix BUG_ON during garbage collect
  2022-05-19  8:50 [PATCH] " David Howells
  2022-05-19 22:17 ` Jarkko Sakkinen
  2022-05-23 17:39 ` Stephen Brennan
@ 2022-06-01 21:59 ` Stephen Brennan
  2022-06-02  1:31   ` Linus Torvalds
  2022-06-01 22:41 ` Eric Biggers
  3 siblings, 1 reply; 12+ messages in thread
From: Stephen Brennan @ 2022-06-01 21:59 UTC (permalink / raw)
  To: David Howells, torvalds
  Cc: stable, Jarkko Sakkinen, Andrew Morton, keyrings, dhowells, linux-kernel

David Howells <dhowells@redhat.com> writes:
> From: Stephen Brennan <stephen.s.brennan@oracle.com>
>
> A rare BUG_ON triggered in assoc_array_gc:
>
>     [3430308.818153] kernel BUG at lib/assoc_array.c:1609!
>
> Which corresponded to the statement currently at line 1593 upstream:
>
>     BUG_ON(assoc_array_ptr_is_meta(p));
>
> Using the data from the core dump, I was able to generate a userspace
> reproducer[1] and determine the cause of the bug.
>
> [1]: https://github.com/brenns10/kernel_stuff/tree/master/assoc_array_gc
>
> After running the iterator on the entire branch, an internal tree node
> looked like the following:
>
>     NODE (nr_leaves_on_branch: 3)
>       SLOT [0] NODE (2 leaves)
>       SLOT [1] NODE (1 leaf)
>       SLOT [2..f] NODE (empty)
>
> In the userspace reproducer, the pr_devel output when compressing this
> node was:
>
>     -- compress node 0x5607cc089380 --
>     free=0, leaves=0
>     [0] retain node 2/1 [nx 0]
>     [1] fold node 1/1 [nx 0]
>     [2] fold node 0/1 [nx 2]
>     [3] fold node 0/2 [nx 2]
>     [4] fold node 0/3 [nx 2]
>     [5] fold node 0/4 [nx 2]
>     [6] fold node 0/5 [nx 2]
>     [7] fold node 0/6 [nx 2]
>     [8] fold node 0/7 [nx 2]
>     [9] fold node 0/8 [nx 2]
>     [10] fold node 0/9 [nx 2]
>     [11] fold node 0/10 [nx 2]
>     [12] fold node 0/11 [nx 2]
>     [13] fold node 0/12 [nx 2]
>     [14] fold node 0/13 [nx 2]
>     [15] fold node 0/14 [nx 2]
>     after: 3
>
> At slot 0, an internal node with 2 leaves could not be folded into the
> node, because there was only one available slot (slot 0). Thus, the
> internal node was retained. At slot 1, the node had one leaf, and was
> able to be folded in successfully. The remaining nodes had no leaves,
> and so were removed. By the end of the compression stage, there were 14
> free slots, and only 3 leaf nodes. The tree was ascended and then its
> parent node was compressed. When this node was seen, it could not be
> folded, due to the internal node it contained.
>
> The invariant for compression in this function is: whenever
> nr_leaves_on_branch < ASSOC_ARRAY_FAN_OUT, the node should contain all
> leaf nodes. The compression step currently cannot guarantee this, given
> the corner case shown above.
>
> To fix this issue, retry compression whenever we have retained a node,
> and yet nr_leaves_on_branch < ASSOC_ARRAY_FAN_OUT. This second
> compression will then allow the node in slot 1 to be folded in,
> satisfying the invariant. Below is the output of the reproducer once the
> fix is applied:
>
>     -- compress node 0x560e9c562380 --
>     free=0, leaves=0
>     [0] retain node 2/1 [nx 0]
>     [1] fold node 1/1 [nx 0]
>     [2] fold node 0/1 [nx 2]
>     [3] fold node 0/2 [nx 2]
>     [4] fold node 0/3 [nx 2]
>     [5] fold node 0/4 [nx 2]
>     [6] fold node 0/5 [nx 2]
>     [7] fold node 0/6 [nx 2]
>     [8] fold node 0/7 [nx 2]
>     [9] fold node 0/8 [nx 2]
>     [10] fold node 0/9 [nx 2]
>     [11] fold node 0/10 [nx 2]
>     [12] fold node 0/11 [nx 2]
>     [13] fold node 0/12 [nx 2]
>     [14] fold node 0/13 [nx 2]
>     [15] fold node 0/14 [nx 2]
>     internal nodes remain despite enough space, retrying
>     -- compress node 0x560e9c562380 --
>     free=14, leaves=1
>     [0] fold node 2/15 [nx 0]
>     after: 3
>
> Changes
> =======
> DH:
>  - Use false instead of 0.
>  - Reorder the inserted lines in a couple of places to put retained before
>    next_slot.
>
> ver #2)
>  - Fix typo in pr_devel, correct comparison to "<="
>
>
> Fixes: 3cb989501c26 ("Add a generic associative array implementation.")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Jarkko Sakkinen <jarkko@kernel.org>
> cc: Andrew Morton <akpm@linux-foundation.org>
> cc: keyrings@vger.kernel.org
> Link: https://lore.kernel.org/r/20220511225517.407935-1-stephen.s.brennan@oracle.com/ # v1
> Link: https://lore.kernel.org/r/20220512215045.489140-1-stephen.s.brennan@oracle.com/ # v2
> ---
>
>  lib/assoc_array.c |    8 ++++++++
>  1 file changed, 8 insertions(+)

Hi,

Just wanted to check on this patch as the 5.19 window closes. David, are
you planning on taking this through a particular tree, or is the ask for
Linus to pick it directly?

Thanks,
Stephen

>
> diff --git a/lib/assoc_array.c b/lib/assoc_array.c
> index 079c72e26493..ca0b4f360c1a 100644
> --- a/lib/assoc_array.c
> +++ b/lib/assoc_array.c
> @@ -1461,6 +1461,7 @@ int assoc_array_gc(struct assoc_array *array,
>  	struct assoc_array_ptr *cursor, *ptr;
>  	struct assoc_array_ptr *new_root, *new_parent, **new_ptr_pp;
>  	unsigned long nr_leaves_on_tree;
> +	bool retained;
>  	int keylen, slot, nr_free, next_slot, i;
>  
>  	pr_devel("-->%s()\n", __func__);
> @@ -1536,6 +1537,7 @@ int assoc_array_gc(struct assoc_array *array,
>  		goto descend;
>  	}
>  
> +retry_compress:
>  	pr_devel("-- compress node %p --\n", new_n);
>  
>  	/* Count up the number of empty slots in this node and work out the
> @@ -1553,6 +1555,7 @@ int assoc_array_gc(struct assoc_array *array,
>  	pr_devel("free=%d, leaves=%lu\n", nr_free, new_n->nr_leaves_on_branch);
>  
>  	/* See what we can fold in */
> +	retained = false;
>  	next_slot = 0;
>  	for (slot = 0; slot < ASSOC_ARRAY_FAN_OUT; slot++) {
>  		struct assoc_array_shortcut *s;
> @@ -1602,9 +1605,14 @@ int assoc_array_gc(struct assoc_array *array,
>  			pr_devel("[%d] retain node %lu/%d [nx %d]\n",
>  				 slot, child->nr_leaves_on_branch, nr_free + 1,
>  				 next_slot);
> +			retained = true;
>  		}
>  	}
>  
> +	if (retained && new_n->nr_leaves_on_branch <= ASSOC_ARRAY_FAN_OUT) {
> +		pr_devel("internal nodes remain despite enough space, retrying\n");
> +		goto retry_compress;
> +	}
>  	pr_devel("after: %lu\n", new_n->nr_leaves_on_branch);
>  
>  	nr_leaves_on_tree = new_n->nr_leaves_on_branch;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] assoc_array: Fix BUG_ON during garbage collect
  2022-05-19  8:50 [PATCH] " David Howells
  2022-05-19 22:17 ` Jarkko Sakkinen
@ 2022-05-23 17:39 ` Stephen Brennan
  2022-06-01 21:59 ` Stephen Brennan
  2022-06-01 22:41 ` Eric Biggers
  3 siblings, 0 replies; 12+ messages in thread
From: Stephen Brennan @ 2022-05-23 17:39 UTC (permalink / raw)
  To: David Howells, torvalds
  Cc: stable, Jarkko Sakkinen, Andrew Morton, keyrings, dhowells, linux-kernel

David Howells <dhowells@redhat.com> writes:
> From: Stephen Brennan <stephen.s.brennan@oracle.com>
>
> A rare BUG_ON triggered in assoc_array_gc:
>
>     [3430308.818153] kernel BUG at lib/assoc_array.c:1609!
>
> Which corresponded to the statement currently at line 1593 upstream:
>
>     BUG_ON(assoc_array_ptr_is_meta(p));
>
> Using the data from the core dump, I was able to generate a userspace
> reproducer[1] and determine the cause of the bug.
>
> [1]: https://github.com/brenns10/kernel_stuff/tree/master/assoc_array_gc
>
> After running the iterator on the entire branch, an internal tree node
> looked like the following:
>
>     NODE (nr_leaves_on_branch: 3)
>       SLOT [0] NODE (2 leaves)
>       SLOT [1] NODE (1 leaf)
>       SLOT [2..f] NODE (empty)
>
> In the userspace reproducer, the pr_devel output when compressing this
> node was:
>
>     -- compress node 0x5607cc089380 --
>     free=0, leaves=0
>     [0] retain node 2/1 [nx 0]
>     [1] fold node 1/1 [nx 0]
>     [2] fold node 0/1 [nx 2]
>     [3] fold node 0/2 [nx 2]
>     [4] fold node 0/3 [nx 2]
>     [5] fold node 0/4 [nx 2]
>     [6] fold node 0/5 [nx 2]
>     [7] fold node 0/6 [nx 2]
>     [8] fold node 0/7 [nx 2]
>     [9] fold node 0/8 [nx 2]
>     [10] fold node 0/9 [nx 2]
>     [11] fold node 0/10 [nx 2]
>     [12] fold node 0/11 [nx 2]
>     [13] fold node 0/12 [nx 2]
>     [14] fold node 0/13 [nx 2]
>     [15] fold node 0/14 [nx 2]
>     after: 3
>
> At slot 0, an internal node with 2 leaves could not be folded into the
> node, because there was only one available slot (slot 0). Thus, the
> internal node was retained. At slot 1, the node had one leaf, and was
> able to be folded in successfully. The remaining nodes had no leaves,
> and so were removed. By the end of the compression stage, there were 14
> free slots, and only 3 leaf nodes. The tree was ascended and then its
> parent node was compressed. When this node was seen, it could not be
> folded, due to the internal node it contained.
>
> The invariant for compression in this function is: whenever
> nr_leaves_on_branch < ASSOC_ARRAY_FAN_OUT, the node should contain all
> leaf nodes. The compression step currently cannot guarantee this, given
> the corner case shown above.
>
> To fix this issue, retry compression whenever we have retained a node,
> and yet nr_leaves_on_branch < ASSOC_ARRAY_FAN_OUT. This second
> compression will then allow the node in slot 1 to be folded in,
> satisfying the invariant. Below is the output of the reproducer once the
> fix is applied:
>
>     -- compress node 0x560e9c562380 --
>     free=0, leaves=0
>     [0] retain node 2/1 [nx 0]
>     [1] fold node 1/1 [nx 0]
>     [2] fold node 0/1 [nx 2]
>     [3] fold node 0/2 [nx 2]
>     [4] fold node 0/3 [nx 2]
>     [5] fold node 0/4 [nx 2]
>     [6] fold node 0/5 [nx 2]
>     [7] fold node 0/6 [nx 2]
>     [8] fold node 0/7 [nx 2]
>     [9] fold node 0/8 [nx 2]
>     [10] fold node 0/9 [nx 2]
>     [11] fold node 0/10 [nx 2]
>     [12] fold node 0/11 [nx 2]
>     [13] fold node 0/12 [nx 2]
>     [14] fold node 0/13 [nx 2]
>     [15] fold node 0/14 [nx 2]
>     internal nodes remain despite enough space, retrying
>     -- compress node 0x560e9c562380 --
>     free=14, leaves=1
>     [0] fold node 2/15 [nx 0]
>     after: 3
>
> Changes
> =======
> DH:
>  - Use false instead of 0.
>  - Reorder the inserted lines in a couple of places to put retained before
>    next_slot.

Thanks for the fixes, sorry I left so many loose ends on this fix! All
looks good to me.

Stephen

>
> ver #2)
>  - Fix typo in pr_devel, correct comparison to "<="
>
>
> Fixes: 3cb989501c26 ("Add a generic associative array implementation.")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Jarkko Sakkinen <jarkko@kernel.org>
> cc: Andrew Morton <akpm@linux-foundation.org>
> cc: keyrings@vger.kernel.org
> Link: https://lore.kernel.org/r/20220511225517.407935-1-stephen.s.brennan@oracle.com/ # v1
> Link: https://lore.kernel.org/r/20220512215045.489140-1-stephen.s.brennan@oracle.com/ # v2
> ---
>
>  lib/assoc_array.c |    8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/lib/assoc_array.c b/lib/assoc_array.c
> index 079c72e26493..ca0b4f360c1a 100644
> --- a/lib/assoc_array.c
> +++ b/lib/assoc_array.c
> @@ -1461,6 +1461,7 @@ int assoc_array_gc(struct assoc_array *array,
>  	struct assoc_array_ptr *cursor, *ptr;
>  	struct assoc_array_ptr *new_root, *new_parent, **new_ptr_pp;
>  	unsigned long nr_leaves_on_tree;
> +	bool retained;
>  	int keylen, slot, nr_free, next_slot, i;
>  
>  	pr_devel("-->%s()\n", __func__);
> @@ -1536,6 +1537,7 @@ int assoc_array_gc(struct assoc_array *array,
>  		goto descend;
>  	}
>  
> +retry_compress:
>  	pr_devel("-- compress node %p --\n", new_n);
>  
>  	/* Count up the number of empty slots in this node and work out the
> @@ -1553,6 +1555,7 @@ int assoc_array_gc(struct assoc_array *array,
>  	pr_devel("free=%d, leaves=%lu\n", nr_free, new_n->nr_leaves_on_branch);
>  
>  	/* See what we can fold in */
> +	retained = false;
>  	next_slot = 0;
>  	for (slot = 0; slot < ASSOC_ARRAY_FAN_OUT; slot++) {
>  		struct assoc_array_shortcut *s;
> @@ -1602,9 +1605,14 @@ int assoc_array_gc(struct assoc_array *array,
>  			pr_devel("[%d] retain node %lu/%d [nx %d]\n",
>  				 slot, child->nr_leaves_on_branch, nr_free + 1,
>  				 next_slot);
> +			retained = true;
>  		}
>  	}
>  
> +	if (retained && new_n->nr_leaves_on_branch <= ASSOC_ARRAY_FAN_OUT) {
> +		pr_devel("internal nodes remain despite enough space, retrying\n");
> +		goto retry_compress;
> +	}
>  	pr_devel("after: %lu\n", new_n->nr_leaves_on_branch);
>  
>  	nr_leaves_on_tree = new_n->nr_leaves_on_branch;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] assoc_array: Fix BUG_ON during garbage collect
  2022-05-19  8:50 [PATCH] " David Howells
@ 2022-05-19 22:17 ` Jarkko Sakkinen
  2022-05-23 17:39 ` Stephen Brennan
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Jarkko Sakkinen @ 2022-05-19 22:17 UTC (permalink / raw)
  To: David Howells
  Cc: torvalds, stable, Stephen Brennan, Andrew Morton, keyrings, linux-kernel

On Thu, May 19, 2022 at 09:50:30AM +0100, David Howells wrote:
> From: Stephen Brennan <stephen.s.brennan@oracle.com>
> 
> A rare BUG_ON triggered in assoc_array_gc:
> 
>     [3430308.818153] kernel BUG at lib/assoc_array.c:1609!
> 
> Which corresponded to the statement currently at line 1593 upstream:
> 
>     BUG_ON(assoc_array_ptr_is_meta(p));
> 
> Using the data from the core dump, I was able to generate a userspace
> reproducer[1] and determine the cause of the bug.
> 
> [1]: https://github.com/brenns10/kernel_stuff/tree/master/assoc_array_gc
> 
> After running the iterator on the entire branch, an internal tree node
> looked like the following:
> 
>     NODE (nr_leaves_on_branch: 3)
>       SLOT [0] NODE (2 leaves)
>       SLOT [1] NODE (1 leaf)
>       SLOT [2..f] NODE (empty)
> 
> In the userspace reproducer, the pr_devel output when compressing this
> node was:
> 
>     -- compress node 0x5607cc089380 --
>     free=0, leaves=0
>     [0] retain node 2/1 [nx 0]
>     [1] fold node 1/1 [nx 0]
>     [2] fold node 0/1 [nx 2]
>     [3] fold node 0/2 [nx 2]
>     [4] fold node 0/3 [nx 2]
>     [5] fold node 0/4 [nx 2]
>     [6] fold node 0/5 [nx 2]
>     [7] fold node 0/6 [nx 2]
>     [8] fold node 0/7 [nx 2]
>     [9] fold node 0/8 [nx 2]
>     [10] fold node 0/9 [nx 2]
>     [11] fold node 0/10 [nx 2]
>     [12] fold node 0/11 [nx 2]
>     [13] fold node 0/12 [nx 2]
>     [14] fold node 0/13 [nx 2]
>     [15] fold node 0/14 [nx 2]
>     after: 3
> 
> At slot 0, an internal node with 2 leaves could not be folded into the
> node, because there was only one available slot (slot 0). Thus, the
> internal node was retained. At slot 1, the node had one leaf, and was
> able to be folded in successfully. The remaining nodes had no leaves,
> and so were removed. By the end of the compression stage, there were 14
> free slots, and only 3 leaf nodes. The tree was ascended and then its
> parent node was compressed. When this node was seen, it could not be
> folded, due to the internal node it contained.
> 
> The invariant for compression in this function is: whenever
> nr_leaves_on_branch < ASSOC_ARRAY_FAN_OUT, the node should contain all
> leaf nodes. The compression step currently cannot guarantee this, given
> the corner case shown above.
> 
> To fix this issue, retry compression whenever we have retained a node,
> and yet nr_leaves_on_branch < ASSOC_ARRAY_FAN_OUT. This second
> compression will then allow the node in slot 1 to be folded in,
> satisfying the invariant. Below is the output of the reproducer once the
> fix is applied:
> 
>     -- compress node 0x560e9c562380 --
>     free=0, leaves=0
>     [0] retain node 2/1 [nx 0]
>     [1] fold node 1/1 [nx 0]
>     [2] fold node 0/1 [nx 2]
>     [3] fold node 0/2 [nx 2]
>     [4] fold node 0/3 [nx 2]
>     [5] fold node 0/4 [nx 2]
>     [6] fold node 0/5 [nx 2]
>     [7] fold node 0/6 [nx 2]
>     [8] fold node 0/7 [nx 2]
>     [9] fold node 0/8 [nx 2]
>     [10] fold node 0/9 [nx 2]
>     [11] fold node 0/10 [nx 2]
>     [12] fold node 0/11 [nx 2]
>     [13] fold node 0/12 [nx 2]
>     [14] fold node 0/13 [nx 2]
>     [15] fold node 0/14 [nx 2]
>     internal nodes remain despite enough space, retrying
>     -- compress node 0x560e9c562380 --
>     free=14, leaves=1
>     [0] fold node 2/15 [nx 0]
>     after: 3
> 
> Changes
> =======
> DH:
>  - Use false instead of 0.
>  - Reorder the inserted lines in a couple of places to put retained before
>    next_slot.
> 
> ver #2)
>  - Fix typo in pr_devel, correct comparison to "<="
> 
> 
> Fixes: 3cb989501c26 ("Add a generic associative array implementation.")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Jarkko Sakkinen <jarkko@kernel.org>
> cc: Andrew Morton <akpm@linux-foundation.org>
> cc: keyrings@vger.kernel.org
> Link: https://lore.kernel.org/r/20220511225517.407935-1-stephen.s.brennan@oracle.com/ # v1
> Link: https://lore.kernel.org/r/20220512215045.489140-1-stephen.s.brennan@oracle.com/ # v2
> ---
> 
>  lib/assoc_array.c |    8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/lib/assoc_array.c b/lib/assoc_array.c
> index 079c72e26493..ca0b4f360c1a 100644
> --- a/lib/assoc_array.c
> +++ b/lib/assoc_array.c
> @@ -1461,6 +1461,7 @@ int assoc_array_gc(struct assoc_array *array,
>  	struct assoc_array_ptr *cursor, *ptr;
>  	struct assoc_array_ptr *new_root, *new_parent, **new_ptr_pp;
>  	unsigned long nr_leaves_on_tree;
> +	bool retained;
>  	int keylen, slot, nr_free, next_slot, i;
>  
>  	pr_devel("-->%s()\n", __func__);
> @@ -1536,6 +1537,7 @@ int assoc_array_gc(struct assoc_array *array,
>  		goto descend;
>  	}
>  
> +retry_compress:
>  	pr_devel("-- compress node %p --\n", new_n);
>  
>  	/* Count up the number of empty slots in this node and work out the
> @@ -1553,6 +1555,7 @@ int assoc_array_gc(struct assoc_array *array,
>  	pr_devel("free=%d, leaves=%lu\n", nr_free, new_n->nr_leaves_on_branch);
>  
>  	/* See what we can fold in */
> +	retained = false;
>  	next_slot = 0;
>  	for (slot = 0; slot < ASSOC_ARRAY_FAN_OUT; slot++) {
>  		struct assoc_array_shortcut *s;
> @@ -1602,9 +1605,14 @@ int assoc_array_gc(struct assoc_array *array,
>  			pr_devel("[%d] retain node %lu/%d [nx %d]\n",
>  				 slot, child->nr_leaves_on_branch, nr_free + 1,
>  				 next_slot);
> +			retained = true;
>  		}
>  	}
>  
> +	if (retained && new_n->nr_leaves_on_branch <= ASSOC_ARRAY_FAN_OUT) {
> +		pr_devel("internal nodes remain despite enough space, retrying\n");
> +		goto retry_compress;
> +	}
>  	pr_devel("after: %lu\n", new_n->nr_leaves_on_branch);
>  
>  	nr_leaves_on_tree = new_n->nr_leaves_on_branch;
> 
> 

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>

BR, Jarkko

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] assoc_array: Fix BUG_ON during garbage collect
@ 2022-05-19  8:50 David Howells
  2022-05-19 22:17 ` Jarkko Sakkinen
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: David Howells @ 2022-05-19  8:50 UTC (permalink / raw)
  To: torvalds
  Cc: stable, Stephen Brennan, Jarkko Sakkinen, Andrew Morton,
	keyrings, dhowells, linux-kernel

From: Stephen Brennan <stephen.s.brennan@oracle.com>

A rare BUG_ON triggered in assoc_array_gc:

    [3430308.818153] kernel BUG at lib/assoc_array.c:1609!

Which corresponded to the statement currently at line 1593 upstream:

    BUG_ON(assoc_array_ptr_is_meta(p));

Using the data from the core dump, I was able to generate a userspace
reproducer[1] and determine the cause of the bug.

[1]: https://github.com/brenns10/kernel_stuff/tree/master/assoc_array_gc

After running the iterator on the entire branch, an internal tree node
looked like the following:

    NODE (nr_leaves_on_branch: 3)
      SLOT [0] NODE (2 leaves)
      SLOT [1] NODE (1 leaf)
      SLOT [2..f] NODE (empty)

In the userspace reproducer, the pr_devel output when compressing this
node was:

    -- compress node 0x5607cc089380 --
    free=0, leaves=0
    [0] retain node 2/1 [nx 0]
    [1] fold node 1/1 [nx 0]
    [2] fold node 0/1 [nx 2]
    [3] fold node 0/2 [nx 2]
    [4] fold node 0/3 [nx 2]
    [5] fold node 0/4 [nx 2]
    [6] fold node 0/5 [nx 2]
    [7] fold node 0/6 [nx 2]
    [8] fold node 0/7 [nx 2]
    [9] fold node 0/8 [nx 2]
    [10] fold node 0/9 [nx 2]
    [11] fold node 0/10 [nx 2]
    [12] fold node 0/11 [nx 2]
    [13] fold node 0/12 [nx 2]
    [14] fold node 0/13 [nx 2]
    [15] fold node 0/14 [nx 2]
    after: 3

At slot 0, an internal node with 2 leaves could not be folded into the
node, because there was only one available slot (slot 0). Thus, the
internal node was retained. At slot 1, the node had one leaf, and was
able to be folded in successfully. The remaining nodes had no leaves,
and so were removed. By the end of the compression stage, there were 14
free slots, and only 3 leaf nodes. The tree was ascended and then its
parent node was compressed. When this node was seen, it could not be
folded, due to the internal node it contained.

The invariant for compression in this function is: whenever
nr_leaves_on_branch < ASSOC_ARRAY_FAN_OUT, the node should contain all
leaf nodes. The compression step currently cannot guarantee this, given
the corner case shown above.

To fix this issue, retry compression whenever we have retained a node,
and yet nr_leaves_on_branch < ASSOC_ARRAY_FAN_OUT. This second
compression will then allow the node in slot 1 to be folded in,
satisfying the invariant. Below is the output of the reproducer once the
fix is applied:

    -- compress node 0x560e9c562380 --
    free=0, leaves=0
    [0] retain node 2/1 [nx 0]
    [1] fold node 1/1 [nx 0]
    [2] fold node 0/1 [nx 2]
    [3] fold node 0/2 [nx 2]
    [4] fold node 0/3 [nx 2]
    [5] fold node 0/4 [nx 2]
    [6] fold node 0/5 [nx 2]
    [7] fold node 0/6 [nx 2]
    [8] fold node 0/7 [nx 2]
    [9] fold node 0/8 [nx 2]
    [10] fold node 0/9 [nx 2]
    [11] fold node 0/10 [nx 2]
    [12] fold node 0/11 [nx 2]
    [13] fold node 0/12 [nx 2]
    [14] fold node 0/13 [nx 2]
    [15] fold node 0/14 [nx 2]
    internal nodes remain despite enough space, retrying
    -- compress node 0x560e9c562380 --
    free=14, leaves=1
    [0] fold node 2/15 [nx 0]
    after: 3

Changes
=======
DH:
 - Use false instead of 0.
 - Reorder the inserted lines in a couple of places to put retained before
   next_slot.

ver #2)
 - Fix typo in pr_devel, correct comparison to "<="


Fixes: 3cb989501c26 ("Add a generic associative array implementation.")
Cc: <stable@vger.kernel.org>
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jarkko Sakkinen <jarkko@kernel.org>
cc: Andrew Morton <akpm@linux-foundation.org>
cc: keyrings@vger.kernel.org
Link: https://lore.kernel.org/r/20220511225517.407935-1-stephen.s.brennan@oracle.com/ # v1
Link: https://lore.kernel.org/r/20220512215045.489140-1-stephen.s.brennan@oracle.com/ # v2
---

 lib/assoc_array.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/lib/assoc_array.c b/lib/assoc_array.c
index 079c72e26493..ca0b4f360c1a 100644
--- a/lib/assoc_array.c
+++ b/lib/assoc_array.c
@@ -1461,6 +1461,7 @@ int assoc_array_gc(struct assoc_array *array,
 	struct assoc_array_ptr *cursor, *ptr;
 	struct assoc_array_ptr *new_root, *new_parent, **new_ptr_pp;
 	unsigned long nr_leaves_on_tree;
+	bool retained;
 	int keylen, slot, nr_free, next_slot, i;
 
 	pr_devel("-->%s()\n", __func__);
@@ -1536,6 +1537,7 @@ int assoc_array_gc(struct assoc_array *array,
 		goto descend;
 	}
 
+retry_compress:
 	pr_devel("-- compress node %p --\n", new_n);
 
 	/* Count up the number of empty slots in this node and work out the
@@ -1553,6 +1555,7 @@ int assoc_array_gc(struct assoc_array *array,
 	pr_devel("free=%d, leaves=%lu\n", nr_free, new_n->nr_leaves_on_branch);
 
 	/* See what we can fold in */
+	retained = false;
 	next_slot = 0;
 	for (slot = 0; slot < ASSOC_ARRAY_FAN_OUT; slot++) {
 		struct assoc_array_shortcut *s;
@@ -1602,9 +1605,14 @@ int assoc_array_gc(struct assoc_array *array,
 			pr_devel("[%d] retain node %lu/%d [nx %d]\n",
 				 slot, child->nr_leaves_on_branch, nr_free + 1,
 				 next_slot);
+			retained = true;
 		}
 	}
 
+	if (retained && new_n->nr_leaves_on_branch <= ASSOC_ARRAY_FAN_OUT) {
+		pr_devel("internal nodes remain despite enough space, retrying\n");
+		goto retry_compress;
+	}
 	pr_devel("after: %lu\n", new_n->nr_leaves_on_branch);
 
 	nr_leaves_on_tree = new_n->nr_leaves_on_branch;



^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-06-02 17:05 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-11 22:55 [PATCH] assoc_array: Fix BUG_ON during garbage collect Stephen Brennan
2022-05-11 23:15 ` Stephen Brennan
2022-05-12 21:49 ` Stephen Brennan
2022-05-12 21:50   ` [PATCH v2] " Stephen Brennan
2022-05-19  8:50 [PATCH] " David Howells
2022-05-19 22:17 ` Jarkko Sakkinen
2022-05-23 17:39 ` Stephen Brennan
2022-06-01 21:59 ` Stephen Brennan
2022-06-02  1:31   ` Linus Torvalds
2022-06-02 17:05     ` Stephen Brennan
2022-06-01 22:41 ` Eric Biggers
2022-06-01 23:03   ` Stephen Brennan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.