All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/1] stackdepot hash table autosizing
@ 2022-05-27 11:37 Vlastimil Babka
  2022-05-27 11:37 ` [RFC PATCH 1/1] lib/stackdepot: replace CONFIG_STACK_HASH_ORDER with automatic sizing Vlastimil Babka
  2022-06-20 15:02 ` [PATCH] " Vlastimil Babka
  0 siblings, 2 replies; 7+ messages in thread
From: Vlastimil Babka @ 2022-05-27 11:37 UTC (permalink / raw)
  To: Marco Elver, Alexander Potapenko, Linus Torvalds, linux-mm
  Cc: linux-kernel, Andrey Konovalov, Dmitry Vyukov, kasan-dev,
	Vlastimil Babka

Hi,

in response to Linus [1] I have been looking a bit into using
rhashtables in stackdepot, as stackdepot is gaining new users and the
config option-based static hash table size becomes unwieldy.

With rhashtables it would just scale fully automatically, and we could
also consider creating separate stackdepot instances (including own
rhashtable) for each user, as each will have a distinct set of stack
trackes to store, so there's no benefit of storing them together.

However, AFAIU stackdepot was initially created for KASAN, and is
sometimes called from restricted contexts, i.e. via
kasan_record_aux_stack_noalloc() where it avoids allocations, and it's
also why stackdepot uses raw_spin_lock.

rhashtable offloads allocations to a kthread, so that's fine, but uses
non-raw spinlock, so that would be a problem for some of the stackdepot
contexts. It also uses RCU read lock, while some of
kasan_record_aux_stack_noalloc() callsites are in the RCU implementation
code, so that could be also an issue (haven't investigated it in detail).

For the SLUB_DEBUG context specifically, rhashtable uses kmalloc() and
variants, so there's potential recursion issue. While it's expected
those allocations will eventually become large enough to be passed to
kmalloc_large() and avoid slab, we would have to make sure all of them
do.

So my impression is that we could convert some stackdepot users to
rhashtable, but not all. So maybe we could create a second stackdepot
API for those, but KASAN would have to use the original one anyway. As
such it makes sense to me to improve the existing API to replace the
problematic CONFIG_STACK_HASH_ORDER with something that is also not
resizable on the fly, but doesn't require build-time configuration
anymore, and picks automatically a size depending on the system memory
size. Hence I'm proposing the following patch, regardless of whether
we proceed with rhashtables for the other stackdepot users.

Vlastimil

[1] https://lore.kernel.org/all/CAHk-=wjC5nS+fnf6EzRD9yQRJApAhxx7gRB87ZV+pAWo9oVrTg@mail.gmail.com/

Vlastimil Babka (1):
  lib/stackdepot: replace CONFIG_STACK_HASH_ORDER with automatic sizing

 lib/Kconfig      |  9 ---------
 lib/stackdepot.c | 47 ++++++++++++++++++++++++++++++++++++-----------
 2 files changed, 36 insertions(+), 20 deletions(-)

-- 
2.36.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC PATCH 1/1] lib/stackdepot: replace CONFIG_STACK_HASH_ORDER with automatic sizing
  2022-05-27 11:37 [RFC PATCH 0/1] stackdepot hash table autosizing Vlastimil Babka
@ 2022-05-27 11:37 ` Vlastimil Babka
  2022-05-27 12:02   ` Dmitry Vyukov
  2022-06-20 15:02 ` [PATCH] " Vlastimil Babka
  1 sibling, 1 reply; 7+ messages in thread
From: Vlastimil Babka @ 2022-05-27 11:37 UTC (permalink / raw)
  To: Marco Elver, Alexander Potapenko, Linus Torvalds, linux-mm
  Cc: linux-kernel, Andrey Konovalov, Dmitry Vyukov, kasan-dev,
	Vlastimil Babka

As Linus explained [1], setting the stackdepot hash table size as a
config option is suboptimal, especially as stackdepot becomes a
dependency of less specialized subsystems than initially (e.g. DRM,
networking, SLUB_DEBUG):

: (a) it introduces a new compile-time question that isn't sane to ask
: a regular user, but is now exposed to regular users.

: (b) this by default uses 1MB of memory for a feature that didn't in
: the past, so now if you have small machines you need to make sure you
: make a special kernel config for them.

Ideally we would employ rhashtable for fully automatic resizing, which
should be feasible for many of the new users, but problematic for the
original users with restricted context that call __stack_depot_save()
with can_alloc == false, i.e. KASAN.

However we can easily remove the config option and scale the hash table
automatically with system memory. The STACK_HASH_MASK constant becomes
stack_hash_mask variable and is used only in one mask operation, so the
overhead should be negligible to none. For early allocation we can
employ the existing alloc_large_system_hash() function and perform
similar scaling for the late allocation.

The existing limits of the config option (between 4k and 1M buckets)
are preserved, and scaling factor is set to one bucket per 16kB memory
so on 64bit the max 1M buckets (8MB memory) is achieved with 16GB
system, while a 1GB system will use 512kB.

If needed, the automatic scaling could be complemented with a boot-time
kernel parameter, but it feels pointless to add it without a specific
use case.

[1] https://lore.kernel.org/all/CAHk-=wjC5nS+fnf6EzRD9yQRJApAhxx7gRB87ZV+pAWo9oVrTg@mail.gmail.com/

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 lib/Kconfig      |  9 ---------
 lib/stackdepot.c | 47 ++++++++++++++++++++++++++++++++++++-----------
 2 files changed, 36 insertions(+), 20 deletions(-)

diff --git a/lib/Kconfig b/lib/Kconfig
index 6a843639814f..1e7cf7c76ae6 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -682,15 +682,6 @@ config STACKDEPOT_ALWAYS_INIT
 	bool
 	select STACKDEPOT
 
-config STACK_HASH_ORDER
-	int "stack depot hash size (12 => 4KB, 20 => 1024KB)"
-	range 12 20
-	default 20
-	depends on STACKDEPOT
-	help
-	 Select the hash size as a power of 2 for the stackdepot hash table.
-	 Choose a lower value to reduce the memory impact.
-
 config REF_TRACKER
 	bool
 	depends on STACKTRACE_SUPPORT
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 5ca0d086ef4a..f7b73ddfca77 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -145,10 +145,15 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
 	return stack;
 }
 
-#define STACK_HASH_SIZE (1L << CONFIG_STACK_HASH_ORDER)
-#define STACK_HASH_MASK (STACK_HASH_SIZE - 1)
+/* one hash table bucket entry per 16kB of memory */
+#define STACK_HASH_SCALE	14
+/* limited between 4k and 1M buckets */
+#define STACK_HASH_ORDER_MIN	12
+#define STACK_HASH_ORDER_MAX	20
 #define STACK_HASH_SEED 0x9747b28c
 
+static unsigned int stack_hash_mask;
+
 static bool stack_depot_disable;
 static struct stack_record **stack_table;
 
@@ -175,8 +180,6 @@ void __init stack_depot_want_early_init(void)
 
 int __init stack_depot_early_init(void)
 {
-	size_t size;
-
 	/* This is supposed to be called only once, from mm_init() */
 	if (WARN_ON(__stack_depot_early_init_passed))
 		return 0;
@@ -186,10 +189,15 @@ int __init stack_depot_early_init(void)
 	if (!__stack_depot_want_early_init || stack_depot_disable)
 		return 0;
 
-	size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
-	pr_info("Stack Depot early init allocating hash table with memblock_alloc, %zu bytes\n",
-		size);
-	stack_table = memblock_alloc(size, SMP_CACHE_BYTES);
+	stack_table = alloc_large_system_hash("stackdepot",
+						sizeof(struct stack_record *),
+						0,
+						STACK_HASH_SCALE,
+						HASH_EARLY | HASH_ZERO,
+						NULL,
+						&stack_hash_mask,
+						1UL << STACK_HASH_ORDER_MIN,
+						1UL << STACK_HASH_ORDER_MAX);
 
 	if (!stack_table) {
 		pr_err("Stack Depot hash table allocation failed, disabling\n");
@@ -207,13 +215,30 @@ int stack_depot_init(void)
 
 	mutex_lock(&stack_depot_init_mutex);
 	if (!stack_depot_disable && !stack_table) {
-		pr_info("Stack Depot allocating hash table with kvcalloc\n");
-		stack_table = kvcalloc(STACK_HASH_SIZE, sizeof(struct stack_record *), GFP_KERNEL);
+		unsigned long entries;
+
+		entries = nr_free_buffer_pages();
+		entries = roundup_pow_of_two(entries);
+
+		if (STACK_HASH_SCALE > PAGE_SHIFT)
+			entries >>= (STACK_HASH_SCALE - PAGE_SHIFT);
+		else
+			entries <<= (PAGE_SHIFT - STACK_HASH_SCALE);
+
+		if (entries < 1UL << STACK_HASH_ORDER_MIN)
+			entries = 1UL << STACK_HASH_ORDER_MIN;
+		if (entries > 1UL << STACK_HASH_ORDER_MAX)
+			entries = 1UL << STACK_HASH_ORDER_MAX;
+
+		pr_info("Stack Depot allocating hash table of %lu entries with kvcalloc\n",
+				entries);
+		stack_table = kvcalloc(entries, sizeof(struct stack_record *), GFP_KERNEL);
 		if (!stack_table) {
 			pr_err("Stack Depot hash table allocation failed, disabling\n");
 			stack_depot_disable = true;
 			ret = -ENOMEM;
 		}
+		stack_hash_mask = entries - 1;
 	}
 	mutex_unlock(&stack_depot_init_mutex);
 	return ret;
@@ -386,7 +411,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 		goto fast_exit;
 
 	hash = hash_stack(entries, nr_entries);
-	bucket = &stack_table[hash & STACK_HASH_MASK];
+	bucket = &stack_table[hash & stack_hash_mask];
 
 	/*
 	 * Fast path: look the stack trace up without locking.
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 1/1] lib/stackdepot: replace CONFIG_STACK_HASH_ORDER with automatic sizing
  2022-05-27 11:37 ` [RFC PATCH 1/1] lib/stackdepot: replace CONFIG_STACK_HASH_ORDER with automatic sizing Vlastimil Babka
@ 2022-05-27 12:02   ` Dmitry Vyukov
  2022-06-20 12:00     ` Vlastimil Babka
  0 siblings, 1 reply; 7+ messages in thread
From: Dmitry Vyukov @ 2022-05-27 12:02 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Marco Elver, Alexander Potapenko, Linus Torvalds, linux-mm,
	linux-kernel, Andrey Konovalov, kasan-dev

On Fri, 27 May 2022 at 13:37, Vlastimil Babka <vbabka@suse.cz> wrote:
>
> As Linus explained [1], setting the stackdepot hash table size as a
> config option is suboptimal, especially as stackdepot becomes a
> dependency of less specialized subsystems than initially (e.g. DRM,
> networking, SLUB_DEBUG):
>
> : (a) it introduces a new compile-time question that isn't sane to ask
> : a regular user, but is now exposed to regular users.
>
> : (b) this by default uses 1MB of memory for a feature that didn't in
> : the past, so now if you have small machines you need to make sure you
> : make a special kernel config for them.
>
> Ideally we would employ rhashtable for fully automatic resizing, which
> should be feasible for many of the new users, but problematic for the
> original users with restricted context that call __stack_depot_save()
> with can_alloc == false, i.e. KASAN.
>
> However we can easily remove the config option and scale the hash table
> automatically with system memory. The STACK_HASH_MASK constant becomes
> stack_hash_mask variable and is used only in one mask operation, so the
> overhead should be negligible to none. For early allocation we can
> employ the existing alloc_large_system_hash() function and perform
> similar scaling for the late allocation.
>
> The existing limits of the config option (between 4k and 1M buckets)
> are preserved, and scaling factor is set to one bucket per 16kB memory
> so on 64bit the max 1M buckets (8MB memory) is achieved with 16GB
> system, while a 1GB system will use 512kB.

Hi Vlastimil,

We use KASAN with VMs with 2GB of memory.
If I did the math correctly this will result in 128K entries, while
currently we have CONFIG_STACK_HASH_ORDER=20 even for arm32.
I am actually not sure how full the table gets, but we can fuzz a
large kernel for up to an hour, so we can get lots of stacks (we were
the only known users who routinely overflowed default LOCKDEP tables
:)).

I am not opposed to this in general. And I understand that KASAN Is
different from the other users.
What do you think re allowing CONFIG_STACK_HASH_ORDER=0/is not set
which will mean auto-size, but keeping ability to set exact size as
well?
Or alternatively auto-size if KASAN is not enabled and use a large
table otherwise? But I am not sure if anybody used
CONFIG_STACK_HASH_ORDER to reduce the default size with KASAN...



> If needed, the automatic scaling could be complemented with a boot-time
> kernel parameter, but it feels pointless to add it without a specific
> use case.
>
> [1] https://lore.kernel.org/all/CAHk-=wjC5nS+fnf6EzRD9yQRJApAhxx7gRB87ZV+pAWo9oVrTg@mail.gmail.com/
>
> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  lib/Kconfig      |  9 ---------
>  lib/stackdepot.c | 47 ++++++++++++++++++++++++++++++++++++-----------
>  2 files changed, 36 insertions(+), 20 deletions(-)
>
> diff --git a/lib/Kconfig b/lib/Kconfig
> index 6a843639814f..1e7cf7c76ae6 100644
> --- a/lib/Kconfig
> +++ b/lib/Kconfig
> @@ -682,15 +682,6 @@ config STACKDEPOT_ALWAYS_INIT
>         bool
>         select STACKDEPOT
>
> -config STACK_HASH_ORDER
> -       int "stack depot hash size (12 => 4KB, 20 => 1024KB)"
> -       range 12 20
> -       default 20
> -       depends on STACKDEPOT
> -       help
> -        Select the hash size as a power of 2 for the stackdepot hash table.
> -        Choose a lower value to reduce the memory impact.
> -
>  config REF_TRACKER
>         bool
>         depends on STACKTRACE_SUPPORT
> diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> index 5ca0d086ef4a..f7b73ddfca77 100644
> --- a/lib/stackdepot.c
> +++ b/lib/stackdepot.c
> @@ -145,10 +145,15 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
>         return stack;
>  }
>
> -#define STACK_HASH_SIZE (1L << CONFIG_STACK_HASH_ORDER)
> -#define STACK_HASH_MASK (STACK_HASH_SIZE - 1)
> +/* one hash table bucket entry per 16kB of memory */
> +#define STACK_HASH_SCALE       14
> +/* limited between 4k and 1M buckets */
> +#define STACK_HASH_ORDER_MIN   12
> +#define STACK_HASH_ORDER_MAX   20
>  #define STACK_HASH_SEED 0x9747b28c
>
> +static unsigned int stack_hash_mask;
> +
>  static bool stack_depot_disable;
>  static struct stack_record **stack_table;
>
> @@ -175,8 +180,6 @@ void __init stack_depot_want_early_init(void)
>
>  int __init stack_depot_early_init(void)
>  {
> -       size_t size;
> -
>         /* This is supposed to be called only once, from mm_init() */
>         if (WARN_ON(__stack_depot_early_init_passed))
>                 return 0;
> @@ -186,10 +189,15 @@ int __init stack_depot_early_init(void)
>         if (!__stack_depot_want_early_init || stack_depot_disable)
>                 return 0;
>
> -       size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
> -       pr_info("Stack Depot early init allocating hash table with memblock_alloc, %zu bytes\n",
> -               size);
> -       stack_table = memblock_alloc(size, SMP_CACHE_BYTES);
> +       stack_table = alloc_large_system_hash("stackdepot",
> +                                               sizeof(struct stack_record *),
> +                                               0,
> +                                               STACK_HASH_SCALE,
> +                                               HASH_EARLY | HASH_ZERO,
> +                                               NULL,
> +                                               &stack_hash_mask,
> +                                               1UL << STACK_HASH_ORDER_MIN,
> +                                               1UL << STACK_HASH_ORDER_MAX);
>
>         if (!stack_table) {
>                 pr_err("Stack Depot hash table allocation failed, disabling\n");
> @@ -207,13 +215,30 @@ int stack_depot_init(void)
>
>         mutex_lock(&stack_depot_init_mutex);
>         if (!stack_depot_disable && !stack_table) {
> -               pr_info("Stack Depot allocating hash table with kvcalloc\n");
> -               stack_table = kvcalloc(STACK_HASH_SIZE, sizeof(struct stack_record *), GFP_KERNEL);
> +               unsigned long entries;
> +
> +               entries = nr_free_buffer_pages();
> +               entries = roundup_pow_of_two(entries);
> +
> +               if (STACK_HASH_SCALE > PAGE_SHIFT)
> +                       entries >>= (STACK_HASH_SCALE - PAGE_SHIFT);
> +               else
> +                       entries <<= (PAGE_SHIFT - STACK_HASH_SCALE);
> +
> +               if (entries < 1UL << STACK_HASH_ORDER_MIN)
> +                       entries = 1UL << STACK_HASH_ORDER_MIN;
> +               if (entries > 1UL << STACK_HASH_ORDER_MAX)
> +                       entries = 1UL << STACK_HASH_ORDER_MAX;
> +
> +               pr_info("Stack Depot allocating hash table of %lu entries with kvcalloc\n",
> +                               entries);
> +               stack_table = kvcalloc(entries, sizeof(struct stack_record *), GFP_KERNEL);
>                 if (!stack_table) {
>                         pr_err("Stack Depot hash table allocation failed, disabling\n");
>                         stack_depot_disable = true;
>                         ret = -ENOMEM;
>                 }
> +               stack_hash_mask = entries - 1;
>         }
>         mutex_unlock(&stack_depot_init_mutex);
>         return ret;
> @@ -386,7 +411,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
>                 goto fast_exit;
>
>         hash = hash_stack(entries, nr_entries);
> -       bucket = &stack_table[hash & STACK_HASH_MASK];
> +       bucket = &stack_table[hash & stack_hash_mask];
>
>         /*
>          * Fast path: look the stack trace up without locking.
> --
> 2.36.1
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 1/1] lib/stackdepot: replace CONFIG_STACK_HASH_ORDER with automatic sizing
  2022-05-27 12:02   ` Dmitry Vyukov
@ 2022-06-20 12:00     ` Vlastimil Babka
  2022-06-20 12:02       ` Dmitry Vyukov
  0 siblings, 1 reply; 7+ messages in thread
From: Vlastimil Babka @ 2022-06-20 12:00 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Marco Elver, Alexander Potapenko, Linus Torvalds, linux-mm,
	linux-kernel, Andrey Konovalov, kasan-dev

On 5/27/22 14:02, Dmitry Vyukov wrote:
> On Fri, 27 May 2022 at 13:37, Vlastimil Babka <vbabka@suse.cz> wrote:
>>
>> As Linus explained [1], setting the stackdepot hash table size as a
>> config option is suboptimal, especially as stackdepot becomes a
>> dependency of less specialized subsystems than initially (e.g. DRM,
>> networking, SLUB_DEBUG):
>>
>> : (a) it introduces a new compile-time question that isn't sane to ask
>> : a regular user, but is now exposed to regular users.
>>
>> : (b) this by default uses 1MB of memory for a feature that didn't in
>> : the past, so now if you have small machines you need to make sure you
>> : make a special kernel config for them.
>>
>> Ideally we would employ rhashtable for fully automatic resizing, which
>> should be feasible for many of the new users, but problematic for the
>> original users with restricted context that call __stack_depot_save()
>> with can_alloc == false, i.e. KASAN.
>>
>> However we can easily remove the config option and scale the hash table
>> automatically with system memory. The STACK_HASH_MASK constant becomes
>> stack_hash_mask variable and is used only in one mask operation, so the
>> overhead should be negligible to none. For early allocation we can
>> employ the existing alloc_large_system_hash() function and perform
>> similar scaling for the late allocation.
>>
>> The existing limits of the config option (between 4k and 1M buckets)
>> are preserved, and scaling factor is set to one bucket per 16kB memory
>> so on 64bit the max 1M buckets (8MB memory) is achieved with 16GB
>> system, while a 1GB system will use 512kB.
> 
> Hi Vlastimil,
> 
> We use KASAN with VMs with 2GB of memory.
> If I did the math correctly this will result in 128K entries, while
> currently we have CONFIG_STACK_HASH_ORDER=20 even for arm32.
> I am actually not sure how full the table gets, but we can fuzz a
> large kernel for up to an hour, so we can get lots of stacks (we were
> the only known users who routinely overflowed default LOCKDEP tables
> :)).

Aha, good to know the order of 20 has some real use case then :)

> I am not opposed to this in general. And I understand that KASAN Is
> different from the other users.
> What do you think re allowing CONFIG_STACK_HASH_ORDER=0/is not set
> which will mean auto-size, but keeping ability to set exact size as
> well?
> Or alternatively auto-size if KASAN is not enabled and use a large
> table otherwise? But I am not sure if anybody used
> CONFIG_STACK_HASH_ORDER to reduce the default size with KASAN...

Well if you're unsure and nobody else requested it so far, we could try
setting it to 20 when KASAN is enabled, and autosize otherwise. If somebody
comes up with a use-case for the boot-time parameter override (instead of
CONFIG_), we can add it then?
>> If needed, the automatic scaling could be complemented with a boot-time
>> kernel parameter, but it feels pointless to add it without a specific
>> use case.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 1/1] lib/stackdepot: replace CONFIG_STACK_HASH_ORDER with automatic sizing
  2022-06-20 12:00     ` Vlastimil Babka
@ 2022-06-20 12:02       ` Dmitry Vyukov
  0 siblings, 0 replies; 7+ messages in thread
From: Dmitry Vyukov @ 2022-06-20 12:02 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Marco Elver, Alexander Potapenko, Linus Torvalds, linux-mm,
	linux-kernel, Andrey Konovalov, kasan-dev

On Mon, 20 Jun 2022 at 14:00, Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 5/27/22 14:02, Dmitry Vyukov wrote:
> > On Fri, 27 May 2022 at 13:37, Vlastimil Babka <vbabka@suse.cz> wrote:
> >>
> >> As Linus explained [1], setting the stackdepot hash table size as a
> >> config option is suboptimal, especially as stackdepot becomes a
> >> dependency of less specialized subsystems than initially (e.g. DRM,
> >> networking, SLUB_DEBUG):
> >>
> >> : (a) it introduces a new compile-time question that isn't sane to ask
> >> : a regular user, but is now exposed to regular users.
> >>
> >> : (b) this by default uses 1MB of memory for a feature that didn't in
> >> : the past, so now if you have small machines you need to make sure you
> >> : make a special kernel config for them.
> >>
> >> Ideally we would employ rhashtable for fully automatic resizing, which
> >> should be feasible for many of the new users, but problematic for the
> >> original users with restricted context that call __stack_depot_save()
> >> with can_alloc == false, i.e. KASAN.
> >>
> >> However we can easily remove the config option and scale the hash table
> >> automatically with system memory. The STACK_HASH_MASK constant becomes
> >> stack_hash_mask variable and is used only in one mask operation, so the
> >> overhead should be negligible to none. For early allocation we can
> >> employ the existing alloc_large_system_hash() function and perform
> >> similar scaling for the late allocation.
> >>
> >> The existing limits of the config option (between 4k and 1M buckets)
> >> are preserved, and scaling factor is set to one bucket per 16kB memory
> >> so on 64bit the max 1M buckets (8MB memory) is achieved with 16GB
> >> system, while a 1GB system will use 512kB.
> >
> > Hi Vlastimil,
> >
> > We use KASAN with VMs with 2GB of memory.
> > If I did the math correctly this will result in 128K entries, while
> > currently we have CONFIG_STACK_HASH_ORDER=20 even for arm32.
> > I am actually not sure how full the table gets, but we can fuzz a
> > large kernel for up to an hour, so we can get lots of stacks (we were
> > the only known users who routinely overflowed default LOCKDEP tables
> > :)).
>
> Aha, good to know the order of 20 has some real use case then :)
>
> > I am not opposed to this in general. And I understand that KASAN Is
> > different from the other users.
> > What do you think re allowing CONFIG_STACK_HASH_ORDER=0/is not set
> > which will mean auto-size, but keeping ability to set exact size as
> > well?
> > Or alternatively auto-size if KASAN is not enabled and use a large
> > table otherwise? But I am not sure if anybody used
> > CONFIG_STACK_HASH_ORDER to reduce the default size with KASAN...
>
> Well if you're unsure and nobody else requested it so far, we could try
> setting it to 20 when KASAN is enabled, and autosize otherwise. If somebody
> comes up with a use-case for the boot-time parameter override (instead of
> CONFIG_), we can add it then?
> >> If needed, the automatic scaling could be complemented with a boot-time
> >> kernel parameter, but it feels pointless to add it without a specific
> >> use case.

Works for me.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] lib/stackdepot: replace CONFIG_STACK_HASH_ORDER with automatic sizing
  2022-05-27 11:37 [RFC PATCH 0/1] stackdepot hash table autosizing Vlastimil Babka
  2022-05-27 11:37 ` [RFC PATCH 1/1] lib/stackdepot: replace CONFIG_STACK_HASH_ORDER with automatic sizing Vlastimil Babka
@ 2022-06-20 15:02 ` Vlastimil Babka
  2022-06-21  7:37   ` Dmitry Vyukov
  1 sibling, 1 reply; 7+ messages in thread
From: Vlastimil Babka @ 2022-06-20 15:02 UTC (permalink / raw)
  To: Marco Elver, Alexander Potapenko, Andrew Morton
  Cc: Linus Torvalds, linux-mm, linux-kernel, patches,
	Andrey Konovalov, Dmitry Vyukov, kasan-dev, Vlastimil Babka

As Linus explained [1], setting the stackdepot hash table size as a
config option is suboptimal, especially as stackdepot becomes a
dependency of less "expert" subsystems than initially (e.g. DRM,
networking, SLUB_DEBUG):

: (a) it introduces a new compile-time question that isn't sane to ask
: a regular user, but is now exposed to regular users.

: (b) this by default uses 1MB of memory for a feature that didn't in
: the past, so now if you have small machines you need to make sure you
: make a special kernel config for them.

Ideally we would employ rhashtable for fully automatic resizing, which
should be feasible for many of the new users, but problematic for the
original users with restricted context that call __stack_depot_save()
with can_alloc == false, i.e. KASAN.

However we can easily remove the config option and scale the hash table
automatically with system memory. The STACK_HASH_MASK constant becomes
stack_hash_mask variable and is used only in one mask operation, so the
overhead should be negligible to none. For early allocation we can
employ the existing alloc_large_system_hash() function and perform
similar scaling for the late allocation.

The existing limits of the config option (between 4k and 1M buckets)
are preserved, and scaling factor is set to one bucket per 16kB memory
so on 64bit the max 1M buckets (8MB memory) is achieved with 16GB
system, while a 1GB system will use 512kB.

Because KASAN is reported to need the maximum number of buckets even
with smaller amounts of memory [2], set it as such when kasan_enabled().

If needed, the automatic scaling could be complemented with a boot-time
kernel parameter, but it feels pointless to add it without a specific
use case.

[1] https://lore.kernel.org/all/CAHk-=wjC5nS+fnf6EzRD9yQRJApAhxx7gRB87ZV+pAWo9oVrTg@mail.gmail.com/
[2] https://lore.kernel.org/all/CACT4Y+Y4GZfXOru2z5tFPzFdaSUd+GFc6KVL=bsa0+1m197cQQ@mail.gmail.com/

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 lib/Kconfig      |  9 --------
 lib/stackdepot.c | 59 ++++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 49 insertions(+), 19 deletions(-)

diff --git a/lib/Kconfig b/lib/Kconfig
index eaaad4d85bf2..986ea474836c 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -685,15 +685,6 @@ config STACKDEPOT_ALWAYS_INIT
 	bool
 	select STACKDEPOT
 
-config STACK_HASH_ORDER
-	int "stack depot hash size (12 => 4KB, 20 => 1024KB)"
-	range 12 20
-	default 20
-	depends on STACKDEPOT
-	help
-	 Select the hash size as a power of 2 for the stackdepot hash table.
-	 Choose a lower value to reduce the memory impact.
-
 config REF_TRACKER
 	bool
 	depends on STACKTRACE_SUPPORT
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 5ca0d086ef4a..e73fda23388d 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -32,6 +32,7 @@
 #include <linux/string.h>
 #include <linux/types.h>
 #include <linux/memblock.h>
+#include <linux/kasan-enabled.h>
 
 #define DEPOT_STACK_BITS (sizeof(depot_stack_handle_t) * 8)
 
@@ -145,10 +146,16 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
 	return stack;
 }
 
-#define STACK_HASH_SIZE (1L << CONFIG_STACK_HASH_ORDER)
-#define STACK_HASH_MASK (STACK_HASH_SIZE - 1)
+/* one hash table bucket entry per 16kB of memory */
+#define STACK_HASH_SCALE	14
+/* limited between 4k and 1M buckets */
+#define STACK_HASH_ORDER_MIN	12
+#define STACK_HASH_ORDER_MAX	20
 #define STACK_HASH_SEED 0x9747b28c
 
+static unsigned int stack_hash_order;
+static unsigned int stack_hash_mask;
+
 static bool stack_depot_disable;
 static struct stack_record **stack_table;
 
@@ -175,7 +182,7 @@ void __init stack_depot_want_early_init(void)
 
 int __init stack_depot_early_init(void)
 {
-	size_t size;
+	unsigned long entries = 0;
 
 	/* This is supposed to be called only once, from mm_init() */
 	if (WARN_ON(__stack_depot_early_init_passed))
@@ -183,13 +190,23 @@ int __init stack_depot_early_init(void)
 
 	__stack_depot_early_init_passed = true;
 
+	if (kasan_enabled() && !stack_hash_order)
+		stack_hash_order = STACK_HASH_ORDER_MAX;
+
 	if (!__stack_depot_want_early_init || stack_depot_disable)
 		return 0;
 
-	size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
-	pr_info("Stack Depot early init allocating hash table with memblock_alloc, %zu bytes\n",
-		size);
-	stack_table = memblock_alloc(size, SMP_CACHE_BYTES);
+	if (stack_hash_order)
+		entries = 1UL <<  stack_hash_order;
+	stack_table = alloc_large_system_hash("stackdepot",
+						sizeof(struct stack_record *),
+						entries,
+						STACK_HASH_SCALE,
+						HASH_EARLY | HASH_ZERO,
+						NULL,
+						&stack_hash_mask,
+						1UL << STACK_HASH_ORDER_MIN,
+						1UL << STACK_HASH_ORDER_MAX);
 
 	if (!stack_table) {
 		pr_err("Stack Depot hash table allocation failed, disabling\n");
@@ -207,13 +224,35 @@ int stack_depot_init(void)
 
 	mutex_lock(&stack_depot_init_mutex);
 	if (!stack_depot_disable && !stack_table) {
-		pr_info("Stack Depot allocating hash table with kvcalloc\n");
-		stack_table = kvcalloc(STACK_HASH_SIZE, sizeof(struct stack_record *), GFP_KERNEL);
+		unsigned long entries;
+		int scale = STACK_HASH_SCALE;
+
+		if (stack_hash_order) {
+			entries = 1UL << stack_hash_order;
+		} else {
+			entries = nr_free_buffer_pages();
+			entries = roundup_pow_of_two(entries);
+
+			if (scale > PAGE_SHIFT)
+				entries >>= (scale - PAGE_SHIFT);
+			else
+				entries <<= (PAGE_SHIFT - scale);
+		}
+
+		if (entries < 1UL << STACK_HASH_ORDER_MIN)
+			entries = 1UL << STACK_HASH_ORDER_MIN;
+		if (entries > 1UL << STACK_HASH_ORDER_MAX)
+			entries = 1UL << STACK_HASH_ORDER_MAX;
+
+		pr_info("Stack Depot allocating hash table of %lu entries with kvcalloc\n",
+				entries);
+		stack_table = kvcalloc(entries, sizeof(struct stack_record *), GFP_KERNEL);
 		if (!stack_table) {
 			pr_err("Stack Depot hash table allocation failed, disabling\n");
 			stack_depot_disable = true;
 			ret = -ENOMEM;
 		}
+		stack_hash_mask = entries - 1;
 	}
 	mutex_unlock(&stack_depot_init_mutex);
 	return ret;
@@ -386,7 +425,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
 		goto fast_exit;
 
 	hash = hash_stack(entries, nr_entries);
-	bucket = &stack_table[hash & STACK_HASH_MASK];
+	bucket = &stack_table[hash & stack_hash_mask];
 
 	/*
 	 * Fast path: look the stack trace up without locking.
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] lib/stackdepot: replace CONFIG_STACK_HASH_ORDER with automatic sizing
  2022-06-20 15:02 ` [PATCH] " Vlastimil Babka
@ 2022-06-21  7:37   ` Dmitry Vyukov
  0 siblings, 0 replies; 7+ messages in thread
From: Dmitry Vyukov @ 2022-06-21  7:37 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Marco Elver, Alexander Potapenko, Andrew Morton, Linus Torvalds,
	linux-mm, linux-kernel, patches, Andrey Konovalov, kasan-dev

On Mon, 20 Jun 2022 at 17:03, Vlastimil Babka <vbabka@suse.cz> wrote:
>
> As Linus explained [1], setting the stackdepot hash table size as a
> config option is suboptimal, especially as stackdepot becomes a
> dependency of less "expert" subsystems than initially (e.g. DRM,
> networking, SLUB_DEBUG):
>
> : (a) it introduces a new compile-time question that isn't sane to ask
> : a regular user, but is now exposed to regular users.
>
> : (b) this by default uses 1MB of memory for a feature that didn't in
> : the past, so now if you have small machines you need to make sure you
> : make a special kernel config for them.
>
> Ideally we would employ rhashtable for fully automatic resizing, which
> should be feasible for many of the new users, but problematic for the
> original users with restricted context that call __stack_depot_save()
> with can_alloc == false, i.e. KASAN.
>
> However we can easily remove the config option and scale the hash table
> automatically with system memory. The STACK_HASH_MASK constant becomes
> stack_hash_mask variable and is used only in one mask operation, so the
> overhead should be negligible to none. For early allocation we can
> employ the existing alloc_large_system_hash() function and perform
> similar scaling for the late allocation.
>
> The existing limits of the config option (between 4k and 1M buckets)
> are preserved, and scaling factor is set to one bucket per 16kB memory
> so on 64bit the max 1M buckets (8MB memory) is achieved with 16GB
> system, while a 1GB system will use 512kB.
>
> Because KASAN is reported to need the maximum number of buckets even
> with smaller amounts of memory [2], set it as such when kasan_enabled().
>
> If needed, the automatic scaling could be complemented with a boot-time
> kernel parameter, but it feels pointless to add it without a specific
> use case.
>
> [1] https://lore.kernel.org/all/CAHk-=wjC5nS+fnf6EzRD9yQRJApAhxx7gRB87ZV+pAWo9oVrTg@mail.gmail.com/
> [2] https://lore.kernel.org/all/CACT4Y+Y4GZfXOru2z5tFPzFdaSUd+GFc6KVL=bsa0+1m197cQQ@mail.gmail.com/
>
> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Dmitry Vyukov <dvyukov@google.com>

> ---
>  lib/Kconfig      |  9 --------
>  lib/stackdepot.c | 59 ++++++++++++++++++++++++++++++++++++++++--------
>  2 files changed, 49 insertions(+), 19 deletions(-)
>
> diff --git a/lib/Kconfig b/lib/Kconfig
> index eaaad4d85bf2..986ea474836c 100644
> --- a/lib/Kconfig
> +++ b/lib/Kconfig
> @@ -685,15 +685,6 @@ config STACKDEPOT_ALWAYS_INIT
>         bool
>         select STACKDEPOT
>
> -config STACK_HASH_ORDER
> -       int "stack depot hash size (12 => 4KB, 20 => 1024KB)"
> -       range 12 20
> -       default 20
> -       depends on STACKDEPOT
> -       help
> -        Select the hash size as a power of 2 for the stackdepot hash table.
> -        Choose a lower value to reduce the memory impact.
> -
>  config REF_TRACKER
>         bool
>         depends on STACKTRACE_SUPPORT
> diff --git a/lib/stackdepot.c b/lib/stackdepot.c
> index 5ca0d086ef4a..e73fda23388d 100644
> --- a/lib/stackdepot.c
> +++ b/lib/stackdepot.c
> @@ -32,6 +32,7 @@
>  #include <linux/string.h>
>  #include <linux/types.h>
>  #include <linux/memblock.h>
> +#include <linux/kasan-enabled.h>
>
>  #define DEPOT_STACK_BITS (sizeof(depot_stack_handle_t) * 8)
>
> @@ -145,10 +146,16 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
>         return stack;
>  }
>
> -#define STACK_HASH_SIZE (1L << CONFIG_STACK_HASH_ORDER)
> -#define STACK_HASH_MASK (STACK_HASH_SIZE - 1)
> +/* one hash table bucket entry per 16kB of memory */
> +#define STACK_HASH_SCALE       14
> +/* limited between 4k and 1M buckets */
> +#define STACK_HASH_ORDER_MIN   12
> +#define STACK_HASH_ORDER_MAX   20
>  #define STACK_HASH_SEED 0x9747b28c
>
> +static unsigned int stack_hash_order;
> +static unsigned int stack_hash_mask;
> +
>  static bool stack_depot_disable;
>  static struct stack_record **stack_table;
>
> @@ -175,7 +182,7 @@ void __init stack_depot_want_early_init(void)
>
>  int __init stack_depot_early_init(void)
>  {
> -       size_t size;
> +       unsigned long entries = 0;
>
>         /* This is supposed to be called only once, from mm_init() */
>         if (WARN_ON(__stack_depot_early_init_passed))
> @@ -183,13 +190,23 @@ int __init stack_depot_early_init(void)
>
>         __stack_depot_early_init_passed = true;
>
> +       if (kasan_enabled() && !stack_hash_order)
> +               stack_hash_order = STACK_HASH_ORDER_MAX;
> +
>         if (!__stack_depot_want_early_init || stack_depot_disable)
>                 return 0;
>
> -       size = (STACK_HASH_SIZE * sizeof(struct stack_record *));
> -       pr_info("Stack Depot early init allocating hash table with memblock_alloc, %zu bytes\n",
> -               size);
> -       stack_table = memblock_alloc(size, SMP_CACHE_BYTES);
> +       if (stack_hash_order)
> +               entries = 1UL <<  stack_hash_order;
> +       stack_table = alloc_large_system_hash("stackdepot",
> +                                               sizeof(struct stack_record *),
> +                                               entries,
> +                                               STACK_HASH_SCALE,
> +                                               HASH_EARLY | HASH_ZERO,
> +                                               NULL,
> +                                               &stack_hash_mask,
> +                                               1UL << STACK_HASH_ORDER_MIN,
> +                                               1UL << STACK_HASH_ORDER_MAX);
>
>         if (!stack_table) {
>                 pr_err("Stack Depot hash table allocation failed, disabling\n");
> @@ -207,13 +224,35 @@ int stack_depot_init(void)
>
>         mutex_lock(&stack_depot_init_mutex);
>         if (!stack_depot_disable && !stack_table) {
> -               pr_info("Stack Depot allocating hash table with kvcalloc\n");
> -               stack_table = kvcalloc(STACK_HASH_SIZE, sizeof(struct stack_record *), GFP_KERNEL);
> +               unsigned long entries;
> +               int scale = STACK_HASH_SCALE;
> +
> +               if (stack_hash_order) {
> +                       entries = 1UL << stack_hash_order;
> +               } else {
> +                       entries = nr_free_buffer_pages();
> +                       entries = roundup_pow_of_two(entries);
> +
> +                       if (scale > PAGE_SHIFT)
> +                               entries >>= (scale - PAGE_SHIFT);
> +                       else
> +                               entries <<= (PAGE_SHIFT - scale);
> +               }
> +
> +               if (entries < 1UL << STACK_HASH_ORDER_MIN)
> +                       entries = 1UL << STACK_HASH_ORDER_MIN;
> +               if (entries > 1UL << STACK_HASH_ORDER_MAX)
> +                       entries = 1UL << STACK_HASH_ORDER_MAX;
> +
> +               pr_info("Stack Depot allocating hash table of %lu entries with kvcalloc\n",
> +                               entries);
> +               stack_table = kvcalloc(entries, sizeof(struct stack_record *), GFP_KERNEL);
>                 if (!stack_table) {
>                         pr_err("Stack Depot hash table allocation failed, disabling\n");
>                         stack_depot_disable = true;
>                         ret = -ENOMEM;
>                 }
> +               stack_hash_mask = entries - 1;
>         }
>         mutex_unlock(&stack_depot_init_mutex);
>         return ret;
> @@ -386,7 +425,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
>                 goto fast_exit;
>
>         hash = hash_stack(entries, nr_entries);
> -       bucket = &stack_table[hash & STACK_HASH_MASK];
> +       bucket = &stack_table[hash & stack_hash_mask];
>
>         /*
>          * Fast path: look the stack trace up without locking.
> --
> 2.36.1
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-06-21  7:37 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-27 11:37 [RFC PATCH 0/1] stackdepot hash table autosizing Vlastimil Babka
2022-05-27 11:37 ` [RFC PATCH 1/1] lib/stackdepot: replace CONFIG_STACK_HASH_ORDER with automatic sizing Vlastimil Babka
2022-05-27 12:02   ` Dmitry Vyukov
2022-06-20 12:00     ` Vlastimil Babka
2022-06-20 12:02       ` Dmitry Vyukov
2022-06-20 15:02 ` [PATCH] " Vlastimil Babka
2022-06-21  7:37   ` Dmitry Vyukov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.