All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf] bpf: Improve bucket_log calculation logic
@ 2020-02-07  8:18 ` Martin KaFai Lau
  0 siblings, 0 replies; 12+ messages in thread
From: Martin KaFai Lau @ 2020-02-07  8:18 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, David Miller, kernel-team,
	Linus Torvalds, Linux-Sparse, Luc Van Oostenryck, netdev,
	Randy Dunlap

It was reported that the max_t, ilog2, and roundup_pow_of_two macros have
exponential effects on the number of states in the sparse checker.

This patch breaks them up by calculating the "nbuckets" first so
that the "bucket_log" only needs to take ilog2().

Fixes: 6ac99e8f23d4 ("bpf: Introduce bpf sk local storage")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 net/core/bpf_sk_storage.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index 458be6b3eda9..3ab23f698221 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -643,9 +643,10 @@ static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr)
 		return ERR_PTR(-ENOMEM);
 	bpf_map_init_from_attr(&smap->map, attr);
 
+	nbuckets = roundup_pow_of_two(num_possible_cpus());
 	/* Use at least 2 buckets, select_bucket() is undefined behavior with 1 bucket */
-	smap->bucket_log = max_t(u32, 1, ilog2(roundup_pow_of_two(num_possible_cpus())));
-	nbuckets = 1U << smap->bucket_log;
+	nbuckets = max_t(u32, 2, nbuckets);
+	smap->bucket_log = ilog2(nbuckets);
 	cost = sizeof(*smap->buckets) * nbuckets + sizeof(*smap);
 
 	ret = bpf_map_charge_init(&smap->map.memory, cost);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH bpf] bpf: Improve bucket_log calculation logic
@ 2020-02-07  8:18 ` Martin KaFai Lau
  0 siblings, 0 replies; 12+ messages in thread
From: Martin KaFai Lau @ 2020-02-07  8:18 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, David Miller, kernel-team,
	Linus Torvalds, Linux-Sparse, Luc Van Oostenryck, netdev,
	Randy Dunlap

It was reported that the max_t, ilog2, and roundup_pow_of_two macros have
exponential effects on the number of states in the sparse checker.

This patch breaks them up by calculating the "nbuckets" first so
that the "bucket_log" only needs to take ilog2().

Fixes: 6ac99e8f23d4 ("bpf: Introduce bpf sk local storage")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 net/core/bpf_sk_storage.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index 458be6b3eda9..3ab23f698221 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -643,9 +643,10 @@ static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr)
 		return ERR_PTR(-ENOMEM);
 	bpf_map_init_from_attr(&smap->map, attr);
 
+	nbuckets = roundup_pow_of_two(num_possible_cpus());
 	/* Use at least 2 buckets, select_bucket() is undefined behavior with 1 bucket */
-	smap->bucket_log = max_t(u32, 1, ilog2(roundup_pow_of_two(num_possible_cpus())));
-	nbuckets = 1U << smap->bucket_log;
+	nbuckets = max_t(u32, 2, nbuckets);
+	smap->bucket_log = ilog2(nbuckets);
 	cost = sizeof(*smap->buckets) * nbuckets + sizeof(*smap);
 
 	ret = bpf_map_charge_init(&smap->map.memory, cost);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf] bpf: Improve bucket_log calculation logic
  2020-02-07  8:18 ` Martin KaFai Lau
  (?)
@ 2020-02-07 13:07 ` Luc Van Oostenryck
  -1 siblings, 0 replies; 12+ messages in thread
From: Luc Van Oostenryck @ 2020-02-07 13:07 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, David Miller,
	kernel-team, Linus Torvalds, Linux-Sparse, netdev, Randy Dunlap

On Fri, Feb 07, 2020 at 12:18:10AM -0800, Martin KaFai Lau wrote:
> 
> diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
> index 458be6b3eda9..3ab23f698221 100644
> --- a/net/core/bpf_sk_storage.c
> +++ b/net/core/bpf_sk_storage.c
> @@ -643,9 +643,10 @@ static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr)
>  		return ERR_PTR(-ENOMEM);
>  	bpf_map_init_from_attr(&smap->map, attr);
>  
> +	nbuckets = roundup_pow_of_two(num_possible_cpus());
>  	/* Use at least 2 buckets, select_bucket() is undefined behavior with 1 bucket */
> -	smap->bucket_log = max_t(u32, 1, ilog2(roundup_pow_of_two(num_possible_cpus())));
> -	nbuckets = 1U << smap->bucket_log;
> +	nbuckets = max_t(u32, 2, nbuckets);
> +	smap->bucket_log = ilog2(nbuckets);
>  	cost = sizeof(*smap->buckets) * nbuckets + sizeof(*smap);
>  
>  	ret = bpf_map_charge_init(&smap->map.memory, cost);
> -- 

Yes, that's much nicer to read. Feel free to add my

Reviewed-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>

-- Luc

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf] bpf: Improve bucket_log calculation logic
  2020-02-07  8:18 ` Martin KaFai Lau
  (?)
  (?)
@ 2020-02-07 18:07 ` Linus Torvalds
  2020-02-07 19:39   ` Linus Torvalds
  2020-02-07 20:13   ` Alexei Starovoitov
  -1 siblings, 2 replies; 12+ messages in thread
From: Linus Torvalds @ 2020-02-07 18:07 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, David Miller,
	kernel-team, Linux-Sparse, Luc Van Oostenryck, Netdev,
	Randy Dunlap

On Fri, Feb 7, 2020 at 12:18 AM Martin KaFai Lau <kafai@fb.com> wrote:
>
> It was reported that the max_t, ilog2, and roundup_pow_of_two macros have
> exponential effects on the number of states in the sparse checker.

Patch looks good, but I'd like to point out that it's not just sparse.

You can see it with a simple

    make net/core/bpf_sk_storage.i
    grep 'smap->bucket_log = ' net/core/bpf_sk_storage.i | wc

and see the end result:

      1  365071 2686974

That's one line (the assignment line) that is 2,686,974 characters in length.

Now, sparse does happen to react particularly badly to that (I didn't
look to why, but I suspect it's just that evaluating all the types
that don't actually ever end up getting used ends up being much more
expensive than it should be), but I bet it's not good for gcc either.

I do think this is a good test-case for sparse. Luc, have you looked
at what it is that then makes sparse use *so* much memory for this one
line?

             Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf] bpf: Improve bucket_log calculation logic
  2020-02-07 18:07 ` Linus Torvalds
@ 2020-02-07 19:39   ` Linus Torvalds
  2020-02-07 20:41     ` Luc Van Oostenryck
  2020-02-07 20:13   ` Alexei Starovoitov
  1 sibling, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2020-02-07 19:39 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, David Miller,
	kernel-team, Linux-Sparse, Luc Van Oostenryck, Netdev,
	Randy Dunlap

On Fri, Feb 7, 2020 at 10:07 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I do think this is a good test-case for sparse. Luc, have you looked
> at what it is that then makes sparse use *so* much memory for this one
> line?

Looking at the profile, it's doing a lot of "copy_expression()".

Which comes from inlining.

I think the problem may be that with that macro expansion from hell we
end up with 28968 copies of cpumask_weight(), and sparse will inline
every one of them into the parse tree - even though basically none of
them are _used_.

In fact, it's worse than that: we end up having a few rounds of
inlining thanks to

static inline unsigned int cpumask_weight(const struct cpumask *srcp)
{
        return bitmap_weight(cpumask_bits(srcp), nr_cpumask_bits);
}

static __always_inline int bitmap_weight(const unsigned long *src,
unsigned int nbits)
{
        if (small_const_nbits(nbits))
                return hweight_long(*src & BITMAP_LAST_WORD_MASK(nbits));
        return __bitmap_weight(src, nbits);
}

static __always_inline unsigned long hweight_long(unsigned long w)
{
        return sizeof(w) == 4 ? hweight32(w) : hweight64(w);
}

where those hweight*() things aren't simple either, they end up doing

  #define hweight32(w) (__builtin_constant_p(w) ? __const_hweight32(w)
: __arch_hweight32(w))
  #define hweight64(w) (__builtin_constant_p(w) ? __const_hweight64(w)
: __arch_hweight64(w))

where the __const_hweight*() in turn are more expansions of a macro
with several levels in order to turn it all into a constant value.

So we may have "only" 28968 calls to cpumask_weight(), but it results
in millions of expressions being expanded.

If we did some basic simplification of constant ops before inlining,
that would likely help a lot.

But currently sparse does inline function expansion at type evaluation
time - so long before it does any simplification of the tree at all.

So that explains why sparse happens to react _so_ badly to this thing.
A real compiler would do inlining much later.

Inlining that early is partly because originally one of the design
ideas in sparse was to make inline functions act basically as
templates, so they'd react to the types of the context. But it really
bites us in the ass here.

Luc, any ideas? Yes, this is solvable in the kernel, but it does show
that sparse simply does a _lot_ of unnecessary work.

               Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf] bpf: Improve bucket_log calculation logic
  2020-02-07 18:07 ` Linus Torvalds
  2020-02-07 19:39   ` Linus Torvalds
@ 2020-02-07 20:13   ` Alexei Starovoitov
  2020-02-07 21:09     ` Linus Torvalds
  1 sibling, 1 reply; 12+ messages in thread
From: Alexei Starovoitov @ 2020-02-07 20:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Martin KaFai Lau, bpf, Alexei Starovoitov, Daniel Borkmann,
	David Miller, kernel-team, Linux-Sparse, Luc Van Oostenryck,
	Netdev, Randy Dunlap

On Fri, Feb 07, 2020 at 10:07:32AM -0800, Linus Torvalds wrote:
> On Fri, Feb 7, 2020 at 12:18 AM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > It was reported that the max_t, ilog2, and roundup_pow_of_two macros have
> > exponential effects on the number of states in the sparse checker.
> 
> Patch looks good, but I'd like to point out that it's not just sparse.
> 
> You can see it with a simple
> 
>     make net/core/bpf_sk_storage.i
>     grep 'smap->bucket_log = ' net/core/bpf_sk_storage.i | wc
> 
> and see the end result:
> 
>       1  365071 2686974
> 
> That's one line (the assignment line) that is 2,686,974 characters in length.

In addition to this patch I've tried:
diff --git a/include/linux/log2.h b/include/linux/log2.h
index 83a4a3ca3e8a..7363abf60854 100644
--- a/include/linux/log2.h
+++ b/include/linux/log2.h
@@ -74,74 +74,76 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
  * Use this where sparse expects a true constant expression, e.g. for array
  * indices.
  */
-#define const_ilog2(n)                         \
-(                                              \
-       __builtin_constant_p(n) ? (             \
-               (n) < 2 ? 0 :                   \
-               (n) & (1ULL << 63) ? 63 :       \
-               (n) & (1ULL << 62) ? 62 :       \
...
+#define __const_ilog2(n, unique_n) ({                  \
+       typeof(n) unique_n = (n);                       \
+       __builtin_constant_p(unique_n) ? (              \
+               (unique_n) < 2 ? 0 :                    \
+               (unique_n) & (1ULL << 63) ? 63 :        \
+               (unique_n) & (1ULL << 62) ? 62 :        \
+               (unique_n) & (1ULL << 61) ? 61 :        \
+               (unique_n) & (1ULL << 60) ? 60 :        \
+               (unique_n) & (1ULL << 59) ? 59 :        \
...
+               (unique_n) & (1ULL <<  3) ?  3 :        \
+               (unique_n) & (1ULL <<  2) ?  2 :        \
                1) :                            \
-       -1)
+       -1; })
+
+#define const_ilog2(n) __const_ilog2(n, __UNIQUE_ID(__n))

and for this nested ilog2() case that caused this explosion
the line got shorter: from 2.6M characters to 144k.
Still a lot.
Unfortunately this approach doesn't work in all cases:
../include/linux/log2.h:77:36: error: braced-group within expression allowed only inside a function
   77 | #define __const_ilog2(n, unique_n) ({   \
      |                                    ^
../include/linux/log2.h:146:24: note: in expansion of macro ‘__const_ilog2’
  146 | #define const_ilog2(n) __const_ilog2(n, __UNIQUE_ID(__n))
      |                        ^~~~~~~~~~~~~
../include/linux/log2.h:161:2: note: in expansion of macro ‘const_ilog2’
  161 |  const_ilog2(n) :  \
      |  ^~~~~~~~~~~
../include/linux/blockgroup_lock.h:14:27: note: in expansion of macro ‘ilog2’
   14 | #define NR_BG_LOCKS (4 << ilog2(NR_CPUS < 32 ? NR_CPUS : 32))
      |                           ^~~~~
../include/linux/blockgroup_lock.h:24:24: note: in expansion of macro ‘NR_BG_LOCKS’
   24 |  struct bgl_lock locks[NR_BG_LOCKS];

Just fyi for folks who're looking at ilog2 and wondering
why it was done this way without ({ })

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf] bpf: Improve bucket_log calculation logic
  2020-02-07 19:39   ` Linus Torvalds
@ 2020-02-07 20:41     ` Luc Van Oostenryck
  2020-02-07 21:22       ` Linus Torvalds
  0 siblings, 1 reply; 12+ messages in thread
From: Luc Van Oostenryck @ 2020-02-07 20:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Martin KaFai Lau, bpf, Alexei Starovoitov, Daniel Borkmann,
	David Miller, kernel-team, Linux-Sparse, Netdev, Randy Dunlap

On Fri, Feb 07, 2020 at 11:39:24AM -0800, Linus Torvalds wrote:
> On Fri, Feb 7, 2020 at 10:07 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > I do think this is a good test-case for sparse. Luc, have you looked
> > at what it is that then makes sparse use *so* much memory for this one
> > line?
> 
> Looking at the profile, it's doing a lot of "copy_expression()".
> 
> Which comes from inlining.
> 
> I think the problem may be that with that macro expansion from hell we
> end up with 28968 copies of cpumask_weight(), and sparse will inline
> every one of them into the parse tree - even though basically none of
> them are _used_.

Yes, indeed. I was just what I saw too.

> In fact, it's worse than that: we end up having a few rounds of
> inlining thanks to

<snip> 

> So we may have "only" 28968 calls to cpumask_weight(), but it results
> in millions of expressions being expanded.

Yes, roughly 1500 expressions per call (:
 
> If we did some basic simplification of constant ops before inlining,
> that would likely help a lot.
> 
> But currently sparse does inline function expansion at type evaluation
> time - so long before it does any simplification of the tree at all.
> 
> So that explains why sparse happens to react _so_ badly to this thing.
> A real compiler would do inlining much later.
> 
> Inlining that early is partly because originally one of the design
> ideas in sparse was to make inline functions act basically as
> templates, so they'd react to the types of the context. But it really
> bites us in the ass here.
> 
> Luc, any ideas? Yes, this is solvable in the kernel, but it does show
> that sparse simply does a _lot_ of unnecessary work.

I never saw it so badly but it's not the first time I've bitten by
the very early inlining. Independently of this, it would be handy to
have an inliner at IR level, it shouldn't be very difficult but ...
OTOH, it should really be straightforward would be to separate the
current tree inliner from the type evaluation and instead run it just
after expansion. The downsides would be:
  * the tree would need to be walked once more;
  * this may make the expansion less useful but it could be run again
    after the inlining.

[ If we would like to keep inline-as-template it would just need to be
  able to detect such inlines at type evaluation and only inline those. ]

I'll look more closely at all of it during the WE.

-- Luc

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf] bpf: Improve bucket_log calculation logic
  2020-02-07 20:13   ` Alexei Starovoitov
@ 2020-02-07 21:09     ` Linus Torvalds
  0 siblings, 0 replies; 12+ messages in thread
From: Linus Torvalds @ 2020-02-07 21:09 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Martin KaFai Lau, bpf, Alexei Starovoitov, Daniel Borkmann,
	David Miller, kernel-team, Linux-Sparse, Luc Van Oostenryck,
	Netdev, Randy Dunlap

On Fri, Feb 7, 2020 at 12:13 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> In addition to this patch I've tried:
> +#define __const_ilog2(n, unique_n) ({                  \
> +       typeof(n) unique_n = (n);                       \

Yeah, as you found out, this doesn't work for the case of having
global initializers or things like array sizes.

Which people do do - often nor directly, but through various size macros.

It's annoying, but one of the failures of C is having a nice form of
compile-time constant handling where you can do slightly smarter
arithmetic. The definition of a "const expression" is very very
limited, and hurts us exactly for the array declaration and constant
initializer case.

Oh well.

          Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf] bpf: Improve bucket_log calculation logic
  2020-02-07 20:41     ` Luc Van Oostenryck
@ 2020-02-07 21:22       ` Linus Torvalds
       [not found]         ` <20200208235459.xmliqf2ksbre53jj@ltop.local>
  0 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2020-02-07 21:22 UTC (permalink / raw)
  To: Luc Van Oostenryck
  Cc: Martin KaFai Lau, bpf, Alexei Starovoitov, Daniel Borkmann,
	David Miller, kernel-team, Linux-Sparse, Netdev, Randy Dunlap

[-- Attachment #1: Type: text/plain, Size: 1533 bytes --]

On Fri, Feb 7, 2020 at 12:41 PM Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
>
> I never saw it so badly but it's not the first time I've bitten by
> the very early inlining. Independently of this, it would be handy to
> have an inliner at IR level, it shouldn't be very difficult but ...
> OTOH, it should really be straightforward would be to separate the
> current tree inliner from the type evaluation and instead run it just
> after expansion. The downsides would be:
>   * the tree would need to be walked once more;

Actually, if we were to do the inlining _during_ expansion, we
wouldn't add a new phase.

>   * this may make the expansion less useful but it could be run again
>     after the inlining.

Same comment: how about doing it as part of the expansion phase?

This is where we handle the built-ins too, it would kind of make sense
to do inlining in expand_symbol_call(), I feel. An inline function is
a "builtin" that the user has defined, after all.

And if we do it in that phase, we'd automatically avoid it for
conditional expressions with a static conditional value, because
expansion does the obvious trivial simplification as it goes along,
and never expands the side that is trivially not seen.

Something like the attached completely broken patch. It builds but
doesn't work, because "inline_function()" is currently designed to
work during evaluation, not during expansion.

So the patch is complete garbage, but maybe could be the starting
point for something that works.

             Linus

[-- Attachment #2: broken.diff --]
[-- Type: text/x-patch, Size: 1117 bytes --]

diff --git a/evaluate.c b/evaluate.c
index f1a266be..b7bb1f52 100644
--- a/evaluate.c
+++ b/evaluate.c
@@ -3107,22 +3107,6 @@ static int evaluate_symbol_call(struct expression *expr)
 	if (ctype->op && ctype->op->evaluate)
 		return ctype->op->evaluate(expr);
 
-	if (ctype->ctype.modifiers & MOD_INLINE) {
-		int ret;
-		struct symbol *curr = current_fn;
-
-		if (ctype->definition)
-			ctype = ctype->definition;
-
-		current_fn = ctype->ctype.base_type;
-
-		ret = inline_function(expr, ctype);
-
-		/* restore the old function */
-		current_fn = curr;
-		return ret;
-	}
-
 	return 0;
 }
 
diff --git a/expand.c b/expand.c
index 36612c86..a4f26461 100644
--- a/expand.c
+++ b/expand.c
@@ -910,6 +910,15 @@ static int expand_symbol_call(struct expression *expr, int cost)
 	if (fn->type != EXPR_PREOP)
 		return SIDE_EFFECTS;
 
+	if (ctype->ctype.modifiers & MOD_INLINE) {
+		struct symbol *def;
+
+		def = ctype->definition ? ctype->definition : ctype;
+		if (inline_function(expr, def))
+			return expand_expression(expr);
+	}
+
+
 	if (ctype->op && ctype->op->expand)
 		return ctype->op->expand(expr, cost);
 

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf] bpf: Improve bucket_log calculation logic
  2020-02-07  8:18 ` Martin KaFai Lau
                   ` (2 preceding siblings ...)
  (?)
@ 2020-02-07 22:04 ` Daniel Borkmann
  -1 siblings, 0 replies; 12+ messages in thread
From: Daniel Borkmann @ 2020-02-07 22:04 UTC (permalink / raw)
  To: Martin KaFai Lau, bpf
  Cc: Alexei Starovoitov, David Miller, kernel-team, Linus Torvalds,
	Linux-Sparse, Luc Van Oostenryck, netdev, Randy Dunlap

On 2/7/20 9:18 AM, Martin KaFai Lau wrote:
> It was reported that the max_t, ilog2, and roundup_pow_of_two macros have
> exponential effects on the number of states in the sparse checker.
> 
> This patch breaks them up by calculating the "nbuckets" first so
> that the "bucket_log" only needs to take ilog2().
> 
> Fixes: 6ac99e8f23d4 ("bpf: Introduce bpf sk local storage")
> Reported-by: Randy Dunlap <rdunlap@infradead.org>
> Reported-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>

Applied (& improved changelog to clarify it's not just sparse), thanks!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf] bpf: Improve bucket_log calculation logic
       [not found]         ` <20200208235459.xmliqf2ksbre53jj@ltop.local>
@ 2020-02-09  0:58           ` Linus Torvalds
  2020-02-09 12:19             ` Luc Van Oostenryck
  0 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2020-02-09  0:58 UTC (permalink / raw)
  To: Luc Van Oostenryck; +Cc: Linux-Sparse

On Sat, Feb 8, 2020 at 3:55 PM Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
>
> > Something like the attached completely broken patch. It builds but
> > doesn't work, because "inline_function()" is currently designed to
> > work during evaluation, not during expansion.
> >
> > So the patch is complete garbage, but maybe could be the starting
> > point for something that works.
>
> It was just missing the part with current_fn (needed for the evaluation)
> and some adjustment to avoid recursive expansions.

Oh, good. I looked at that current_fn thing and decided it shouldn't
possibly matter, so my hacked-up patch just dropped it. Blush. And
yeah, the recursion avoidance got broken because it only triggered
during evaluation and that wasn't where it was recursing any more.

Anyway, your fixed patch looks good, and the numbers look lovely. I
don't see why there would sometimes be extra memory use, but the patch
feels like the right thing to do regardless.

Thanks,
               Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf] bpf: Improve bucket_log calculation logic
  2020-02-09  0:58           ` Linus Torvalds
@ 2020-02-09 12:19             ` Luc Van Oostenryck
  0 siblings, 0 replies; 12+ messages in thread
From: Luc Van Oostenryck @ 2020-02-09 12:19 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux-Sparse

On Sat, Feb 08, 2020 at 04:58:51PM -0800, Linus Torvalds wrote:
> 
> Anyway, your fixed patch looks good, and the numbers look lovely. I
> don't see why there would sometimes be extra memory use, but the patch
> feels like the right thing to do regardless.

Yes, I'm quite happy with it so. Thank you for the suggestion.

For the cases with extra memory consumption, I've investigated the
most extreme one and it's quite interesting. The extra memory was
used for basic blocks, instructions and pseudos, so more linearized
code. I reduced it to:

	static inline int get_order(long size)
	{
		return __builtin_constant_p(size) ? 0 : 1;
	}
	
	int foo(void)
	{
		return get_order(0);
	}

Sparse used to not recognized the size as a constant (I don't
understand why but haven't investigated). Strangely, the builtin
without the conditional gave the expected result.

Now, with the patch doing the inlining during expansion, the size
is correctly recognized as a constant, with or without the conditional.
The extra linearized code comes from some complex expression that is
now selected instead of a function call (while reducing, I had the
vague impression that the expression should have expanded further
but I haven't check that yet).

-- Luc

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-02-09 12:19 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-07  8:18 [PATCH bpf] bpf: Improve bucket_log calculation logic Martin KaFai Lau
2020-02-07  8:18 ` Martin KaFai Lau
2020-02-07 13:07 ` Luc Van Oostenryck
2020-02-07 18:07 ` Linus Torvalds
2020-02-07 19:39   ` Linus Torvalds
2020-02-07 20:41     ` Luc Van Oostenryck
2020-02-07 21:22       ` Linus Torvalds
     [not found]         ` <20200208235459.xmliqf2ksbre53jj@ltop.local>
2020-02-09  0:58           ` Linus Torvalds
2020-02-09 12:19             ` Luc Van Oostenryck
2020-02-07 20:13   ` Alexei Starovoitov
2020-02-07 21:09     ` Linus Torvalds
2020-02-07 22:04 ` Daniel Borkmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.