Netdev Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH bpf-next] bpf: make __htab_lookup_and_delete_batch faster when map is almost empty
@ 2020-08-01  4:57 Brian Vazquez
  2020-08-01 16:58 ` Yonghong Song
  0 siblings, 1 reply; 3+ messages in thread
From: Brian Vazquez @ 2020-08-01  4:57 UTC (permalink / raw)
  To: Brian Vazquez, Brian Vazquez, Alexei Starovoitov,
	Daniel Borkmann, David S . Miller
  Cc: linux-kernel, netdev, bpf, Luigi Rizzo, Yonghong Song

While running some experiments it was observed that map_lookup_batch was much
slower than get_next_key + lookup when the syscall overhead is minimal.
This was because the map_lookup_batch implementation was more expensive
traversing empty buckets, this can be really costly when the pre-allocated
map is too big.

This patch optimizes the case when the bucket is empty so we can move quickly
to next bucket.

The benchmark to exercise this is as follows:

-The map was populate with a single entry to make sure that the syscall overhead
is not helping the map_batch_lookup.
-The size of the preallocated map was increased to show the effect of
traversing empty buckets.

Results:

  Using get_next_key + lookup:

  Benchmark                Time(ns)        CPU(ns)     Iteration
  ---------------------------------------------------------------
  BM_DumpHashMap/1/1k          3593           3586         192680
  BM_DumpHashMap/1/4k          6004           5972         100000
  BM_DumpHashMap/1/16k        15755          15710          44341
  BM_DumpHashMap/1/64k        59525          59376          10000

  Using htab_lookup_batch before this patch:
  Benchmark                Time(ns)        CPU(ns)     Iterations
  ---------------------------------------------------------------
  BM_DumpHashMap/1/1k          3933           3927         177978
  BM_DumpHashMap/1/4k          9192           9177          73951
  BM_DumpHashMap/1/16k        42011          41970          16789
  BM_DumpHashMap/1/64k       117895         117661           6135

  Using htab_lookup_batch with this patch:
  Benchmark                Time(ns)        CPU(ns)     Iterations
  ---------------------------------------------------------------
  BM_DumpHashMap/1/1k          2809           2803         249212
  BM_DumpHashMap/1/4k          5318           5316         100000
  BM_DumpHashMap/1/16k        14925          14895          47448
  BM_DumpHashMap/1/64k        58870          58674          10000

Suggested-by: Luigi Rizzo <lrizzo@google.com>
Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
---
 kernel/bpf/hashtab.c | 23 ++++++++---------------
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 2137e2200d95..150015ea6737 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -1351,7 +1351,6 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 	struct hlist_nulls_head *head;
 	struct hlist_nulls_node *n;
 	unsigned long flags = 0;
-	bool locked = false;
 	struct htab_elem *l;
 	struct bucket *b;
 	int ret = 0;
@@ -1410,19 +1409,19 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 	dst_val = values;
 	b = &htab->buckets[batch];
 	head = &b->head;
-	/* do not grab the lock unless need it (bucket_cnt > 0). */
-	if (locked)
-		flags = htab_lock_bucket(htab, b);
 
+	l = hlist_nulls_entry_safe(rcu_dereference_raw(hlist_nulls_first_rcu(head)),
+					struct htab_elem, hash_node);
+	if (!l && (batch + 1 < htab->n_buckets)) {
+		batch++;
+		goto again_nocopy;
+	}
+
+	flags = htab_lock_bucket(htab, b);
 	bucket_cnt = 0;
 	hlist_nulls_for_each_entry_rcu(l, n, head, hash_node)
 		bucket_cnt++;
 
-	if (bucket_cnt && !locked) {
-		locked = true;
-		goto again_nocopy;
-	}
-
 	if (bucket_cnt > (max_count - total)) {
 		if (total == 0)
 			ret = -ENOSPC;
@@ -1448,10 +1447,6 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 		goto alloc;
 	}
 
-	/* Next block is only safe to run if you have grabbed the lock */
-	if (!locked)
-		goto next_batch;
-
 	hlist_nulls_for_each_entry_safe(l, n, head, hash_node) {
 		memcpy(dst_key, l->key, key_size);
 
@@ -1494,7 +1489,6 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 	}
 
 	htab_unlock_bucket(htab, b, flags);
-	locked = false;
 
 	while (node_to_free) {
 		l = node_to_free;
@@ -1502,7 +1496,6 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 		bpf_lru_push_free(&htab->lru, &l->lru_node);
 	}
 
-next_batch:
 	/* If we are not copying data, we can go to next bucket and avoid
 	 * unlocking the rcu.
 	 */
-- 
2.28.0.163.g6104cc2f0b6-goog


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH bpf-next] bpf: make __htab_lookup_and_delete_batch faster when map is almost empty
  2020-08-01  4:57 [PATCH bpf-next] bpf: make __htab_lookup_and_delete_batch faster when map is almost empty Brian Vazquez
@ 2020-08-01 16:58 ` Yonghong Song
  2020-08-01 18:11   ` Brian Vazquez
  0 siblings, 1 reply; 3+ messages in thread
From: Yonghong Song @ 2020-08-01 16:58 UTC (permalink / raw)
  To: Brian Vazquez, Brian Vazquez, Alexei Starovoitov,
	Daniel Borkmann, David S . Miller
  Cc: linux-kernel, netdev, bpf, Luigi Rizzo



On 7/31/20 9:57 PM, Brian Vazquez wrote:
> While running some experiments it was observed that map_lookup_batch was much
> slower than get_next_key + lookup when the syscall overhead is minimal.
> This was because the map_lookup_batch implementation was more expensive
> traversing empty buckets, this can be really costly when the pre-allocated
> map is too big.
> 
> This patch optimizes the case when the bucket is empty so we can move quickly
> to next bucket.
> 
> The benchmark to exercise this is as follows:
> 
> -The map was populate with a single entry to make sure that the syscall overhead
> is not helping the map_batch_lookup.
> -The size of the preallocated map was increased to show the effect of
> traversing empty buckets.
> 
> Results:
> 
>    Using get_next_key + lookup:
> 
>    Benchmark                Time(ns)        CPU(ns)     Iteration
>    ---------------------------------------------------------------
>    BM_DumpHashMap/1/1k          3593           3586         192680
>    BM_DumpHashMap/1/4k          6004           5972         100000
>    BM_DumpHashMap/1/16k        15755          15710          44341
>    BM_DumpHashMap/1/64k        59525          59376          10000

I think "BM_DumpHashMap/1/64k" means the program "BM_DumpHashMap",
the map having only "1" entry, and the map preallocated size is "64k"?
What is the "Iteration" here? The number of runs with the same dump?
The CPU(ns) is the system cpu consumption, right? The Time/CPU is for
all iterations, not just one, right? It would be good
if the above results can be described better, so people can
understand the results better.

> 
>    Using htab_lookup_batch before this patch:
>    Benchmark                Time(ns)        CPU(ns)     Iterations
>    ---------------------------------------------------------------
>    BM_DumpHashMap/1/1k          3933           3927         177978
>    BM_DumpHashMap/1/4k          9192           9177          73951
>    BM_DumpHashMap/1/16k        42011          41970          16789
>    BM_DumpHashMap/1/64k       117895         117661           6135
> 
>    Using htab_lookup_batch with this patch:
>    Benchmark                Time(ns)        CPU(ns)     Iterations
>    ---------------------------------------------------------------
>    BM_DumpHashMap/1/1k          2809           2803         249212
>    BM_DumpHashMap/1/4k          5318           5316         100000
>    BM_DumpHashMap/1/16k        14925          14895          47448
>    BM_DumpHashMap/1/64k        58870          58674          10000
> 
> Suggested-by: Luigi Rizzo <lrizzo@google.com>
> Cc: Yonghong Song <yhs@fb.com>
> Signed-off-by: Brian Vazquez <brianvv@google.com>
> ---
>   kernel/bpf/hashtab.c | 23 ++++++++---------------
>   1 file changed, 8 insertions(+), 15 deletions(-)
> 
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 2137e2200d95..150015ea6737 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -1351,7 +1351,6 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
>   	struct hlist_nulls_head *head;
>   	struct hlist_nulls_node *n;
>   	unsigned long flags = 0;
> -	bool locked = false;
>   	struct htab_elem *l;
>   	struct bucket *b;
>   	int ret = 0;
> @@ -1410,19 +1409,19 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
>   	dst_val = values;
>   	b = &htab->buckets[batch];
>   	head = &b->head;
> -	/* do not grab the lock unless need it (bucket_cnt > 0). */
> -	if (locked)
> -		flags = htab_lock_bucket(htab, b);
>   
> +	l = hlist_nulls_entry_safe(rcu_dereference_raw(hlist_nulls_first_rcu(head)),
> +					struct htab_elem, hash_node);
> +	if (!l && (batch + 1 < htab->n_buckets)) {
> +		batch++;
> +		goto again_nocopy;
> +	}
> +
> +	flags = htab_lock_bucket(htab, b);
[...]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH bpf-next] bpf: make __htab_lookup_and_delete_batch faster when map is almost empty
  2020-08-01 16:58 ` Yonghong Song
@ 2020-08-01 18:11   ` Brian Vazquez
  0 siblings, 0 replies; 3+ messages in thread
From: Brian Vazquez @ 2020-08-01 18:11 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Brian Vazquez, Alexei Starovoitov, Daniel Borkmann,
	David S . Miller, open list, Linux NetDev, bpf, Luigi Rizzo

On Sat, Aug 1, 2020 at 9:59 AM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 7/31/20 9:57 PM, Brian Vazquez wrote:
> > While running some experiments it was observed that map_lookup_batch was much
> > slower than get_next_key + lookup when the syscall overhead is minimal.
> > This was because the map_lookup_batch implementation was more expensive
> > traversing empty buckets, this can be really costly when the pre-allocated
> > map is too big.
> >
> > This patch optimizes the case when the bucket is empty so we can move quickly
> > to next bucket.
> >
> > The benchmark to exercise this is as follows:
> >
> > -The map was populate with a single entry to make sure that the syscall overhead
> > is not helping the map_batch_lookup.
> > -The size of the preallocated map was increased to show the effect of
> > traversing empty buckets.
> >
> > Results:
> >
> >    Using get_next_key + lookup:
> >
> >    Benchmark                Time(ns)        CPU(ns)     Iteration
> >    ---------------------------------------------------------------
> >    BM_DumpHashMap/1/1k          3593           3586         192680
> >    BM_DumpHashMap/1/4k          6004           5972         100000
> >    BM_DumpHashMap/1/16k        15755          15710          44341
> >    BM_DumpHashMap/1/64k        59525          59376          10000
>
> I think "BM_DumpHashMap/1/64k" means the program "BM_DumpHashMap",
> the map having only "1" entry, and the map preallocated size is "64k"?
> What is the "Iteration" here? The number of runs with the same dump?
> The CPU(ns) is the system cpu consumption, right? The Time/CPU is for
> all iterations, not just one, right? It would be good
> if the above results can be described better, so people can
> understand the results better.
>

Hi Yonghong, thanks for reviewing it!

I'll fix it in next iteration.
> >
> >    Using htab_lookup_batch before this patch:
> >    Benchmark                Time(ns)        CPU(ns)     Iterations
> >    ---------------------------------------------------------------
> >    BM_DumpHashMap/1/1k          3933           3927         177978
> >    BM_DumpHashMap/1/4k          9192           9177          73951
> >    BM_DumpHashMap/1/16k        42011          41970          16789
> >    BM_DumpHashMap/1/64k       117895         117661           6135
> >
> >    Using htab_lookup_batch with this patch:
> >    Benchmark                Time(ns)        CPU(ns)     Iterations
> >    ---------------------------------------------------------------
> >    BM_DumpHashMap/1/1k          2809           2803         249212
> >    BM_DumpHashMap/1/4k          5318           5316         100000
> >    BM_DumpHashMap/1/16k        14925          14895          47448
> >    BM_DumpHashMap/1/64k        58870          58674          10000
> >
> > Suggested-by: Luigi Rizzo <lrizzo@google.com>
> > Cc: Yonghong Song <yhs@fb.com>
> > Signed-off-by: Brian Vazquez <brianvv@google.com>
> > ---
> >   kernel/bpf/hashtab.c | 23 ++++++++---------------
> >   1 file changed, 8 insertions(+), 15 deletions(-)
> >
> > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> > index 2137e2200d95..150015ea6737 100644
> > --- a/kernel/bpf/hashtab.c
> > +++ b/kernel/bpf/hashtab.c
> > @@ -1351,7 +1351,6 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
> >       struct hlist_nulls_head *head;
> >       struct hlist_nulls_node *n;
> >       unsigned long flags = 0;
> > -     bool locked = false;
> >       struct htab_elem *l;
> >       struct bucket *b;
> >       int ret = 0;
> > @@ -1410,19 +1409,19 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
> >       dst_val = values;
> >       b = &htab->buckets[batch];
> >       head = &b->head;
> > -     /* do not grab the lock unless need it (bucket_cnt > 0). */
> > -     if (locked)
> > -             flags = htab_lock_bucket(htab, b);
> >
> > +     l = hlist_nulls_entry_safe(rcu_dereference_raw(hlist_nulls_first_rcu(head)),
> > +                                     struct htab_elem, hash_node);
> > +     if (!l && (batch + 1 < htab->n_buckets)) {
> > +             batch++;
> > +             goto again_nocopy;
> > +     }
> > +
> > +     flags = htab_lock_bucket(htab, b);
> [...]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, back to index

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-01  4:57 [PATCH bpf-next] bpf: make __htab_lookup_and_delete_batch faster when map is almost empty Brian Vazquez
2020-08-01 16:58 ` Yonghong Song
2020-08-01 18:11   ` Brian Vazquez

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org
	public-inbox-index netdev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git