[PATCH bpf] bpf: improve htab_map_get_next_key behaviour during races

* [PATCH bpf] bpf: improve htab_map_get_next_key behaviour during races
@ 2019-10-18 13:43 Lorenz Bauer
  2019-10-18 16:03 ` Alexei Starovoitov
  0 siblings, 1 reply; 3+ messages in thread
From: Lorenz Bauer @ 2019-10-18 13:43 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: joe, Lorenz Bauer

To iterate a BPF map, userspace must use MAP_GET_NEXT_KEY and provide
the last retrieved key. The code then scans the hash table bucket
for the key and returns the key of the next item.

This presents a problem if the last retrieved key isn't present in the
hash table anymore, e.g. due to concurrent deletion. It's not possible
to ascertain the location of a key in a given bucket, so there isn't
really a correct answer. The implementation currently returns the
first key in the first bucket. This guarantees that we never skip an
existing key. However, it means that a user space program iterating
a heavily modified map may never reach the end of the hash table,
forever restarting at the beginning.

Fixing this outright is rather involved. However, we can improve slightly
by never revisiting earlier buckets. Instead of the first key in the
first bucket we return the first key in the "current" bucket. This
doesn't eliminate the problem, but makes it less likely to occur.

Prior to commit 8fe45924387b ("bpf: map_get_next_key to return first key on NULL")
passing a non-existent key to MAP_GET_NEXT_KEY was the only way to
find the first key. Hence there is a small chance that there is code that
will be broken by this change.

Fixes: 8fe45924387b ("bpf: map_get_next_key to return first key on NULL")
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
---
 kernel/bpf/hashtab.c                    | 4 +++-
 tools/testing/selftests/bpf/test_maps.c | 2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 22066a62c8c9..30f0dab488f0 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -613,6 +613,9 @@ static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
 
 	head = select_bucket(htab, hash);
 
+	/* don't iterate previous buckets */
+	i = hash & (htab->n_buckets - 1);
+
 	/* lookup the key */
 	l = lookup_nulls_elem_raw(head, hash, key, key_size, htab->n_buckets);
 
@@ -630,7 +633,6 @@ static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
 	}
 
 	/* no more elements in this hash list, go to the next bucket */
-	i = hash & (htab->n_buckets - 1);
 	i++;
 
 find_first_elem:
diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c
index 806b298397d3..6f351e532ddc 100644
--- a/tools/testing/selftests/bpf/test_maps.c
+++ b/tools/testing/selftests/bpf/test_maps.c
@@ -100,7 +100,7 @@ static void test_hashmap(unsigned int task, void *data)
 	assert(bpf_map_get_next_key(fd, NULL, &first_key) == 0 &&
 	       (first_key == 1 || first_key == 2));
 	assert(bpf_map_get_next_key(fd, &key, &next_key) == 0 &&
-	       (next_key == first_key));
+	       (next_key == 1 || next_key == 2));
 	assert(bpf_map_get_next_key(fd, &next_key, &next_key) == 0 &&
 	       (next_key == 1 || next_key == 2) &&
 	       (next_key != first_key));
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread