All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/5] bpf: bpf maps memory accounting cleanup
@ 2019-05-30  1:03 Roman Gushchin
  2019-05-30  1:03 ` [PATCH bpf-next 1/5] bpf: add memlock precharge check for cgroup_local_storage Roman Gushchin
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Roman Gushchin @ 2019-05-30  1:03 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, bpf
  Cc: kernel-team, linux-kernel, Roman Gushchin

During my work on memcg-based memory accounting for bpf maps
I've done some cleanups and refactorings of the existing
memlock rlimit-based code. It makes it more robust, unifies
size to pages conversion, size checks and corresponding error
codes. Also it adds coverage for cgroup local storage and
socket local storage maps.

It looks like some preliminary work on the mm side might be
required to start working on the memcg-based accounting,
so I'm sending these patches as a separate patchset.


Roman Gushchin (5):
  bpf: add memlock precharge check for cgroup_local_storage
  bpf: add memlock precharge for socket local storage
  bpf: group memory related fields in struct bpf_map_memory
  bpf: rework memlock-based memory accounting for maps
  bpf: move memory size checks to bpf_map_charge_init()

 include/linux/bpf.h           | 15 ++++--
 kernel/bpf/arraymap.c         | 18 ++++----
 kernel/bpf/cpumap.c           |  9 ++--
 kernel/bpf/devmap.c           | 14 +++---
 kernel/bpf/hashtab.c          | 14 ++----
 kernel/bpf/local_storage.c    | 13 ++++--
 kernel/bpf/lpm_trie.c         |  8 +---
 kernel/bpf/queue_stack_maps.c | 13 +++---
 kernel/bpf/reuseport_array.c  | 17 +++----
 kernel/bpf/stackmap.c         | 28 ++++++-----
 kernel/bpf/syscall.c          | 87 ++++++++++++++++++-----------------
 kernel/bpf/xskmap.c           | 10 ++--
 net/core/bpf_sk_storage.c     | 12 ++++-
 net/core/sock_map.c           |  9 +---
 14 files changed, 132 insertions(+), 135 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH bpf-next 1/5] bpf: add memlock precharge check for cgroup_local_storage
  2019-05-30  1:03 [PATCH bpf-next 0/5] bpf: bpf maps memory accounting cleanup Roman Gushchin
@ 2019-05-30  1:03 ` Roman Gushchin
  2019-05-30 18:26   ` Song Liu
  2019-05-30  1:03 ` [PATCH bpf-next 2/5] bpf: add memlock precharge for socket local storage Roman Gushchin
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Roman Gushchin @ 2019-05-30  1:03 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, bpf
  Cc: kernel-team, linux-kernel, Roman Gushchin

Cgroup local storage maps lack the memlock precharge check,
which is performed before the memory allocation for
most other bpf map types.

Let's add it in order to unify all map types.

Signed-off-by: Roman Gushchin <guro@fb.com>
---
 kernel/bpf/local_storage.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
index 980e8f1f6cb5..e48302ecb389 100644
--- a/kernel/bpf/local_storage.c
+++ b/kernel/bpf/local_storage.c
@@ -272,6 +272,8 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
 {
 	int numa_node = bpf_map_attr_numa_node(attr);
 	struct bpf_cgroup_storage_map *map;
+	u32 pages;
+	int ret;
 
 	if (attr->key_size != sizeof(struct bpf_cgroup_storage_key))
 		return ERR_PTR(-EINVAL);
@@ -290,13 +292,18 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
 		/* max_entries is not used and enforced to be 0 */
 		return ERR_PTR(-EINVAL);
 
+	pages = round_up(sizeof(struct bpf_cgroup_storage_map), PAGE_SIZE) >>
+		PAGE_SHIFT;
+	ret = bpf_map_precharge_memlock(pages);
+	if (ret < 0)
+		return ERR_PTR(ret);
+
 	map = kmalloc_node(sizeof(struct bpf_cgroup_storage_map),
 			   __GFP_ZERO | GFP_USER, numa_node);
 	if (!map)
 		return ERR_PTR(-ENOMEM);
 
-	map->map.pages = round_up(sizeof(struct bpf_cgroup_storage_map),
-				  PAGE_SIZE) >> PAGE_SHIFT;
+	map->map.pages = pages;
 
 	/* copy mandatory map attributes */
 	bpf_map_init_from_attr(&map->map, attr);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH bpf-next 2/5] bpf: add memlock precharge for socket local storage
  2019-05-30  1:03 [PATCH bpf-next 0/5] bpf: bpf maps memory accounting cleanup Roman Gushchin
  2019-05-30  1:03 ` [PATCH bpf-next 1/5] bpf: add memlock precharge check for cgroup_local_storage Roman Gushchin
@ 2019-05-30  1:03 ` Roman Gushchin
  2019-05-30 18:26   ` Song Liu
  2019-05-30  1:03 ` [PATCH bpf-next 3/5] bpf: group memory related fields in struct bpf_map_memory Roman Gushchin
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Roman Gushchin @ 2019-05-30  1:03 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, bpf
  Cc: kernel-team, linux-kernel, Roman Gushchin

Socket local storage maps lack the memlock precharge check,
which is performed before the memory allocation for
most other bpf map types.

Let's add it in order to unify all map types.

Signed-off-by: Roman Gushchin <guro@fb.com>
---
 net/core/bpf_sk_storage.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index cc9597a87770..9a8aaf8e235d 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -626,7 +626,9 @@ static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr)
 	struct bpf_sk_storage_map *smap;
 	unsigned int i;
 	u32 nbuckets;
+	u32 pages;
 	u64 cost;
+	int ret;
 
 	smap = kzalloc(sizeof(*smap), GFP_USER | __GFP_NOWARN);
 	if (!smap)
@@ -635,13 +637,19 @@ static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr)
 
 	smap->bucket_log = ilog2(roundup_pow_of_two(num_possible_cpus()));
 	nbuckets = 1U << smap->bucket_log;
+	cost = sizeof(*smap->buckets) * nbuckets + sizeof(*smap);
+	pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
+
+	ret = bpf_map_precharge_memlock(pages);
+	if (ret < 0)
+		return ERR_PTR(ret);
+
 	smap->buckets = kvcalloc(sizeof(*smap->buckets), nbuckets,
 				 GFP_USER | __GFP_NOWARN);
 	if (!smap->buckets) {
 		kfree(smap);
 		return ERR_PTR(-ENOMEM);
 	}
-	cost = sizeof(*smap->buckets) * nbuckets + sizeof(*smap);
 
 	for (i = 0; i < nbuckets; i++) {
 		INIT_HLIST_HEAD(&smap->buckets[i].list);
@@ -651,7 +659,7 @@ static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr)
 	smap->elem_size = sizeof(struct bpf_sk_storage_elem) + attr->value_size;
 	smap->cache_idx = (unsigned int)atomic_inc_return(&cache_idx) %
 		BPF_SK_STORAGE_CACHE_SIZE;
-	smap->map.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
+	smap->map.pages = pages;
 
 	return &smap->map;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH bpf-next 3/5] bpf: group memory related fields in struct bpf_map_memory
  2019-05-30  1:03 [PATCH bpf-next 0/5] bpf: bpf maps memory accounting cleanup Roman Gushchin
  2019-05-30  1:03 ` [PATCH bpf-next 1/5] bpf: add memlock precharge check for cgroup_local_storage Roman Gushchin
  2019-05-30  1:03 ` [PATCH bpf-next 2/5] bpf: add memlock precharge for socket local storage Roman Gushchin
@ 2019-05-30  1:03 ` Roman Gushchin
  2019-05-30 18:53   ` Song Liu
  2019-05-30  1:03 ` [PATCH bpf-next 4/5] bpf: rework memlock-based memory accounting for maps Roman Gushchin
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Roman Gushchin @ 2019-05-30  1:03 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, bpf
  Cc: kernel-team, linux-kernel, Roman Gushchin

Group "user" and "pages" fields of bpf_map into the bpf_map_memory
structure. Later it can be extended with "memcg" and other related
information.

The main reason for a such change (beside cosmetics) is to pass
bpf_map_memory structure to charging functions before the actual
allocation of bpf_map.

Signed-off-by: Roman Gushchin <guro@fb.com>
---
 include/linux/bpf.h           | 10 +++++++---
 kernel/bpf/arraymap.c         |  2 +-
 kernel/bpf/cpumap.c           |  4 ++--
 kernel/bpf/devmap.c           |  4 ++--
 kernel/bpf/hashtab.c          |  4 ++--
 kernel/bpf/local_storage.c    |  2 +-
 kernel/bpf/lpm_trie.c         |  4 ++--
 kernel/bpf/queue_stack_maps.c |  2 +-
 kernel/bpf/reuseport_array.c  |  2 +-
 kernel/bpf/stackmap.c         |  4 ++--
 kernel/bpf/syscall.c          | 19 ++++++++++---------
 kernel/bpf/xskmap.c           |  4 ++--
 net/core/bpf_sk_storage.c     |  2 +-
 net/core/sock_map.c           |  4 ++--
 14 files changed, 36 insertions(+), 31 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index ff3e00ff84d2..980b7a9bdd21 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -66,6 +66,11 @@ struct bpf_map_ops {
 				     u64 imm, u32 *off);
 };
 
+struct bpf_map_memory {
+	u32 pages;
+	struct user_struct *user;
+};
+
 struct bpf_map {
 	/* The first two cachelines with read-mostly members of which some
 	 * are also accessed in fast-path (e.g. ops, max_entries).
@@ -86,7 +91,7 @@ struct bpf_map {
 	u32 btf_key_type_id;
 	u32 btf_value_type_id;
 	struct btf *btf;
-	u32 pages;
+	struct bpf_map_memory memory;
 	bool unpriv_array;
 	bool frozen; /* write-once */
 	/* 48 bytes hole */
@@ -94,8 +99,7 @@ struct bpf_map {
 	/* The 3rd and 4th cacheline with misc members to avoid false sharing
 	 * particularly with refcounting.
 	 */
-	struct user_struct *user ____cacheline_aligned;
-	atomic_t refcnt;
+	atomic_t refcnt ____cacheline_aligned;
 	atomic_t usercnt;
 	struct work_struct work;
 	char name[BPF_OBJ_NAME_LEN];
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 584636c9e2eb..8fda24e78193 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -138,7 +138,7 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
 
 	/* copy mandatory map attributes */
 	bpf_map_init_from_attr(&array->map, attr);
-	array->map.pages = cost;
+	array->map.memory.pages = cost;
 	array->elem_size = elem_size;
 
 	if (percpu && bpf_array_alloc_percpu(array)) {
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index cf727d77c6c6..035268add724 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -108,10 +108,10 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
 	cost += cpu_map_bitmap_size(attr) * num_possible_cpus();
 	if (cost >= U32_MAX - PAGE_SIZE)
 		goto free_cmap;
-	cmap->map.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
+	cmap->map.memory.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
 
 	/* Notice returns -EPERM on if map size is larger than memlock limit */
-	ret = bpf_map_precharge_memlock(cmap->map.pages);
+	ret = bpf_map_precharge_memlock(cmap->map.memory.pages);
 	if (ret) {
 		err = ret;
 		goto free_cmap;
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 1e525d70f833..f6c57efb1d0d 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -111,10 +111,10 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
 	if (cost >= U32_MAX - PAGE_SIZE)
 		goto free_dtab;
 
-	dtab->map.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
+	dtab->map.memory.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
 
 	/* if map size is larger than memlock limit, reject it early */
-	err = bpf_map_precharge_memlock(dtab->map.pages);
+	err = bpf_map_precharge_memlock(dtab->map.memory.pages);
 	if (err)
 		goto free_dtab;
 
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 0f2708fde5f7..15bf228d2e98 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -364,10 +364,10 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 		/* make sure page count doesn't overflow */
 		goto free_htab;
 
-	htab->map.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
+	htab->map.memory.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
 
 	/* if map size is larger than memlock limit, reject it early */
-	err = bpf_map_precharge_memlock(htab->map.pages);
+	err = bpf_map_precharge_memlock(htab->map.memory.pages);
 	if (err)
 		goto free_htab;
 
diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
index e48302ecb389..574325276650 100644
--- a/kernel/bpf/local_storage.c
+++ b/kernel/bpf/local_storage.c
@@ -303,7 +303,7 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
 	if (!map)
 		return ERR_PTR(-ENOMEM);
 
-	map->map.pages = pages;
+	map->map.memory.pages = pages;
 
 	/* copy mandatory map attributes */
 	bpf_map_init_from_attr(&map->map, attr);
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index e61630c2e50b..8e423a582760 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -578,9 +578,9 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr)
 		goto out_err;
 	}
 
-	trie->map.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
+	trie->map.memory.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
 
-	ret = bpf_map_precharge_memlock(trie->map.pages);
+	ret = bpf_map_precharge_memlock(trie->map.memory.pages);
 	if (ret)
 		goto out_err;
 
diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c
index 0b140d236889..8a510e71d486 100644
--- a/kernel/bpf/queue_stack_maps.c
+++ b/kernel/bpf/queue_stack_maps.c
@@ -89,7 +89,7 @@ static struct bpf_map *queue_stack_map_alloc(union bpf_attr *attr)
 
 	bpf_map_init_from_attr(&qs->map, attr);
 
-	qs->map.pages = cost;
+	qs->map.memory.pages = cost;
 	qs->size = size;
 
 	raw_spin_lock_init(&qs->lock);
diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c
index 18e225de80ff..819515242739 100644
--- a/kernel/bpf/reuseport_array.c
+++ b/kernel/bpf/reuseport_array.c
@@ -176,7 +176,7 @@ static struct bpf_map *reuseport_array_alloc(union bpf_attr *attr)
 
 	/* copy mandatory map attributes */
 	bpf_map_init_from_attr(&array->map, attr);
-	array->map.pages = cost;
+	array->map.memory.pages = cost;
 
 	return &array->map;
 }
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 950ab2f28922..08d4efff73ac 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -131,9 +131,9 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr)
 	bpf_map_init_from_attr(&smap->map, attr);
 	smap->map.value_size = value_size;
 	smap->n_buckets = n_buckets;
-	smap->map.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
+	smap->map.memory.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
 
-	err = bpf_map_precharge_memlock(smap->map.pages);
+	err = bpf_map_precharge_memlock(smap->map.memory.pages);
 	if (err)
 		goto free_smap;
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 3d546b6f4646..df14e63806c8 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -222,19 +222,20 @@ static int bpf_map_init_memlock(struct bpf_map *map)
 	struct user_struct *user = get_current_user();
 	int ret;
 
-	ret = bpf_charge_memlock(user, map->pages);
+	ret = bpf_charge_memlock(user, map->memory.pages);
 	if (ret) {
 		free_uid(user);
 		return ret;
 	}
-	map->user = user;
+	map->memory.user = user;
 	return ret;
 }
 
 static void bpf_map_release_memlock(struct bpf_map *map)
 {
-	struct user_struct *user = map->user;
-	bpf_uncharge_memlock(user, map->pages);
+	struct user_struct *user = map->memory.user;
+
+	bpf_uncharge_memlock(user, map->memory.pages);
 	free_uid(user);
 }
 
@@ -242,17 +243,17 @@ int bpf_map_charge_memlock(struct bpf_map *map, u32 pages)
 {
 	int ret;
 
-	ret = bpf_charge_memlock(map->user, pages);
+	ret = bpf_charge_memlock(map->memory.user, pages);
 	if (ret)
 		return ret;
-	map->pages += pages;
+	map->memory.pages += pages;
 	return ret;
 }
 
 void bpf_map_uncharge_memlock(struct bpf_map *map, u32 pages)
 {
-	bpf_uncharge_memlock(map->user, pages);
-	map->pages -= pages;
+	bpf_uncharge_memlock(map->memory.user, pages);
+	map->memory.pages -= pages;
 }
 
 static int bpf_map_alloc_id(struct bpf_map *map)
@@ -395,7 +396,7 @@ static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
 		   map->value_size,
 		   map->max_entries,
 		   map->map_flags,
-		   map->pages * 1ULL << PAGE_SHIFT,
+		   map->memory.pages * 1ULL << PAGE_SHIFT,
 		   map->id,
 		   READ_ONCE(map->frozen));
 
diff --git a/kernel/bpf/xskmap.c b/kernel/bpf/xskmap.c
index 686d244e798d..f816ee1a0fa0 100644
--- a/kernel/bpf/xskmap.c
+++ b/kernel/bpf/xskmap.c
@@ -40,10 +40,10 @@ static struct bpf_map *xsk_map_alloc(union bpf_attr *attr)
 	if (cost >= U32_MAX - PAGE_SIZE)
 		goto free_m;
 
-	m->map.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
+	m->map.memory.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
 
 	/* Notice returns -EPERM on if map size is larger than memlock limit */
-	err = bpf_map_precharge_memlock(m->map.pages);
+	err = bpf_map_precharge_memlock(m->map.memory.pages);
 	if (err)
 		goto free_m;
 
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index 9a8aaf8e235d..92581c3ff220 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -659,7 +659,7 @@ static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr)
 	smap->elem_size = sizeof(struct bpf_sk_storage_elem) + attr->value_size;
 	smap->cache_idx = (unsigned int)atomic_inc_return(&cache_idx) %
 		BPF_SK_STORAGE_CACHE_SIZE;
-	smap->map.pages = pages;
+	smap->map.memory.pages = pages;
 
 	return &smap->map;
 }
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index be6092ac69f8..4eb5b6a1b29f 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -49,8 +49,8 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
 		goto free_stab;
 	}
 
-	stab->map.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
-	err = bpf_map_precharge_memlock(stab->map.pages);
+	stab->map.memory.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
+	err = bpf_map_precharge_memlock(stab->map.memory.pages);
 	if (err)
 		goto free_stab;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH bpf-next 4/5] bpf: rework memlock-based memory accounting for maps
  2019-05-30  1:03 [PATCH bpf-next 0/5] bpf: bpf maps memory accounting cleanup Roman Gushchin
                   ` (2 preceding siblings ...)
  2019-05-30  1:03 ` [PATCH bpf-next 3/5] bpf: group memory related fields in struct bpf_map_memory Roman Gushchin
@ 2019-05-30  1:03 ` Roman Gushchin
  2019-05-30 18:52   ` Song Liu
  2019-05-30  1:03 ` [PATCH bpf-next 5/5] bpf: move memory size checks to bpf_map_charge_init() Roman Gushchin
  2019-06-01  0:00 ` [PATCH bpf-next 0/5] bpf: bpf maps memory accounting cleanup Alexei Starovoitov
  5 siblings, 1 reply; 13+ messages in thread
From: Roman Gushchin @ 2019-05-30  1:03 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, bpf
  Cc: kernel-team, linux-kernel, Roman Gushchin

In order to unify the existing memlock charging code with the
memcg-based memory accounting, which will be added later, let's
rework the current scheme.

Currently the following design is used:
  1) .alloc() callback optionally checks if the allocation will likely
     succeed using bpf_map_precharge_memlock()
  2) .alloc() performs actual allocations
  3) .alloc() callback calculates map cost and sets map.memory.pages
  4) map_create() calls bpf_map_init_memlock() which sets map.memory.user
     and performs actual charging; in case of failure the map is
     destroyed
  <map is in use>
  1) bpf_map_free_deferred() calls bpf_map_release_memlock(), which
     performs uncharge and releases the user
  2) .map_free() callback releases the memory

The scheme can be simplified and made more robust:
  1) .alloc() calculates map cost and calls bpf_map_charge_init()
  2) bpf_map_charge_init() sets map.memory.user and performs actual
    charge
  3) .alloc() performs actual allocations
  <map is in use>
  1) .map_free() callback releases the memory
  2) bpf_map_charge_finish() performs uncharge and releases the user

The new scheme also allows to reuse bpf_map_charge_init()/finish()
functions for memcg-based accounting. Because charges are performed
before actual allocations and uncharges after freeing the memory,
no bogus memory pressure can be created.

In cases when the map structure is not available (e.g. it's not
created yet, or is already destroyed), on-stack bpf_map_memory
structure is used. The charge can be transferred with the
bpf_map_charge_move() function.

Signed-off-by: Roman Gushchin <guro@fb.com>
---
 include/linux/bpf.h           |  5 ++-
 kernel/bpf/arraymap.c         | 10 +++--
 kernel/bpf/cpumap.c           |  8 ++--
 kernel/bpf/devmap.c           | 13 ++++---
 kernel/bpf/hashtab.c          | 11 +++---
 kernel/bpf/local_storage.c    |  9 +++--
 kernel/bpf/lpm_trie.c         |  5 +--
 kernel/bpf/queue_stack_maps.c |  9 +++--
 kernel/bpf/reuseport_array.c  |  9 +++--
 kernel/bpf/stackmap.c         | 30 ++++++++-------
 kernel/bpf/syscall.c          | 69 +++++++++++++++++------------------
 kernel/bpf/xskmap.c           |  9 +++--
 net/core/bpf_sk_storage.c     |  8 ++--
 net/core/sock_map.c           |  5 ++-
 14 files changed, 112 insertions(+), 88 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 980b7a9bdd21..6187203b0414 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -600,9 +600,12 @@ struct bpf_map *__bpf_map_get(struct fd f);
 struct bpf_map * __must_check bpf_map_inc(struct bpf_map *map, bool uref);
 void bpf_map_put_with_uref(struct bpf_map *map);
 void bpf_map_put(struct bpf_map *map);
-int bpf_map_precharge_memlock(u32 pages);
 int bpf_map_charge_memlock(struct bpf_map *map, u32 pages);
 void bpf_map_uncharge_memlock(struct bpf_map *map, u32 pages);
+int bpf_map_charge_init(struct bpf_map_memory *mem, u32 pages);
+void bpf_map_charge_finish(struct bpf_map_memory *mem);
+void bpf_map_charge_move(struct bpf_map_memory *dst,
+			 struct bpf_map_memory *src);
 void *bpf_map_area_alloc(size_t size, int numa_node);
 void bpf_map_area_free(void *base);
 void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr);
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 8fda24e78193..3552da4407d9 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -83,6 +83,7 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
 	u32 elem_size, index_mask, max_entries;
 	bool unpriv = !capable(CAP_SYS_ADMIN);
 	u64 cost, array_size, mask64;
+	struct bpf_map_memory mem;
 	struct bpf_array *array;
 
 	elem_size = round_up(attr->value_size, 8);
@@ -125,23 +126,26 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
 	}
 	cost = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
 
-	ret = bpf_map_precharge_memlock(cost);
+	ret = bpf_map_charge_init(&mem, cost);
 	if (ret < 0)
 		return ERR_PTR(ret);
 
 	/* allocate all map elements and zero-initialize them */
 	array = bpf_map_area_alloc(array_size, numa_node);
-	if (!array)
+	if (!array) {
+		bpf_map_charge_finish(&mem);
 		return ERR_PTR(-ENOMEM);
+	}
 	array->index_mask = index_mask;
 	array->map.unpriv_array = unpriv;
 
 	/* copy mandatory map attributes */
 	bpf_map_init_from_attr(&array->map, attr);
-	array->map.memory.pages = cost;
+	bpf_map_charge_move(&array->map.memory, &mem);
 	array->elem_size = elem_size;
 
 	if (percpu && bpf_array_alloc_percpu(array)) {
+		bpf_map_charge_finish(&array->map.memory);
 		bpf_map_area_free(array);
 		return ERR_PTR(-ENOMEM);
 	}
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index 035268add724..c633c8d68023 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -108,10 +108,10 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
 	cost += cpu_map_bitmap_size(attr) * num_possible_cpus();
 	if (cost >= U32_MAX - PAGE_SIZE)
 		goto free_cmap;
-	cmap->map.memory.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
 
 	/* Notice returns -EPERM on if map size is larger than memlock limit */
-	ret = bpf_map_precharge_memlock(cmap->map.memory.pages);
+	ret = bpf_map_charge_init(&cmap->map.memory,
+				  round_up(cost, PAGE_SIZE) >> PAGE_SHIFT);
 	if (ret) {
 		err = ret;
 		goto free_cmap;
@@ -121,7 +121,7 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
 	cmap->flush_needed = __alloc_percpu(cpu_map_bitmap_size(attr),
 					    __alignof__(unsigned long));
 	if (!cmap->flush_needed)
-		goto free_cmap;
+		goto free_charge;
 
 	/* Alloc array for possible remote "destination" CPUs */
 	cmap->cpu_map = bpf_map_area_alloc(cmap->map.max_entries *
@@ -133,6 +133,8 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
 	return &cmap->map;
 free_percpu:
 	free_percpu(cmap->flush_needed);
+free_charge:
+	bpf_map_charge_finish(&cmap->map.memory);
 free_cmap:
 	kfree(cmap);
 	return ERR_PTR(err);
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index f6c57efb1d0d..371bd880ed58 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -111,10 +111,9 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
 	if (cost >= U32_MAX - PAGE_SIZE)
 		goto free_dtab;
 
-	dtab->map.memory.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
-
-	/* if map size is larger than memlock limit, reject it early */
-	err = bpf_map_precharge_memlock(dtab->map.memory.pages);
+	/* if map size is larger than memlock limit, reject it */
+	err = bpf_map_charge_init(&dtab->map.memory,
+				  round_up(cost, PAGE_SIZE) >> PAGE_SHIFT);
 	if (err)
 		goto free_dtab;
 
@@ -125,19 +124,21 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
 						__alignof__(unsigned long),
 						GFP_KERNEL | __GFP_NOWARN);
 	if (!dtab->flush_needed)
-		goto free_dtab;
+		goto free_charge;
 
 	dtab->netdev_map = bpf_map_area_alloc(dtab->map.max_entries *
 					      sizeof(struct bpf_dtab_netdev *),
 					      dtab->map.numa_node);
 	if (!dtab->netdev_map)
-		goto free_dtab;
+		goto free_charge;
 
 	spin_lock(&dev_map_lock);
 	list_add_tail_rcu(&dtab->list, &dev_map_list);
 	spin_unlock(&dev_map_lock);
 
 	return &dtab->map;
+free_charge:
+	bpf_map_charge_finish(&dtab->map.memory);
 free_dtab:
 	free_percpu(dtab->flush_needed);
 	kfree(dtab);
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 15bf228d2e98..b0bdc7b040ad 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -364,10 +364,9 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 		/* make sure page count doesn't overflow */
 		goto free_htab;
 
-	htab->map.memory.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
-
-	/* if map size is larger than memlock limit, reject it early */
-	err = bpf_map_precharge_memlock(htab->map.memory.pages);
+	/* if map size is larger than memlock limit, reject it */
+	err = bpf_map_charge_init(&htab->map.memory,
+				  round_up(cost, PAGE_SIZE) >> PAGE_SHIFT);
 	if (err)
 		goto free_htab;
 
@@ -376,7 +375,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 					   sizeof(struct bucket),
 					   htab->map.numa_node);
 	if (!htab->buckets)
-		goto free_htab;
+		goto free_charge;
 
 	if (htab->map.map_flags & BPF_F_ZERO_SEED)
 		htab->hashrnd = 0;
@@ -409,6 +408,8 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 	prealloc_destroy(htab);
 free_buckets:
 	bpf_map_area_free(htab->buckets);
+free_charge:
+	bpf_map_charge_finish(&htab->map.memory);
 free_htab:
 	kfree(htab);
 	return ERR_PTR(err);
diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
index 574325276650..e49bfd4f4f6d 100644
--- a/kernel/bpf/local_storage.c
+++ b/kernel/bpf/local_storage.c
@@ -272,6 +272,7 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
 {
 	int numa_node = bpf_map_attr_numa_node(attr);
 	struct bpf_cgroup_storage_map *map;
+	struct bpf_map_memory mem;
 	u32 pages;
 	int ret;
 
@@ -294,16 +295,18 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
 
 	pages = round_up(sizeof(struct bpf_cgroup_storage_map), PAGE_SIZE) >>
 		PAGE_SHIFT;
-	ret = bpf_map_precharge_memlock(pages);
+	ret = bpf_map_charge_init(&mem, pages);
 	if (ret < 0)
 		return ERR_PTR(ret);
 
 	map = kmalloc_node(sizeof(struct bpf_cgroup_storage_map),
 			   __GFP_ZERO | GFP_USER, numa_node);
-	if (!map)
+	if (!map) {
+		bpf_map_charge_finish(&mem);
 		return ERR_PTR(-ENOMEM);
+	}
 
-	map->map.memory.pages = pages;
+	bpf_map_charge_move(&map->map.memory, &mem);
 
 	/* copy mandatory map attributes */
 	bpf_map_init_from_attr(&map->map, attr);
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index 8e423a582760..6345a8d2dcd0 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -578,9 +578,8 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr)
 		goto out_err;
 	}
 
-	trie->map.memory.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
-
-	ret = bpf_map_precharge_memlock(trie->map.memory.pages);
+	ret = bpf_map_charge_init(&trie->map.memory,
+				  round_up(cost, PAGE_SIZE) >> PAGE_SHIFT);
 	if (ret)
 		goto out_err;
 
diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c
index 8a510e71d486..224cb0fd8f03 100644
--- a/kernel/bpf/queue_stack_maps.c
+++ b/kernel/bpf/queue_stack_maps.c
@@ -67,6 +67,7 @@ static int queue_stack_map_alloc_check(union bpf_attr *attr)
 static struct bpf_map *queue_stack_map_alloc(union bpf_attr *attr)
 {
 	int ret, numa_node = bpf_map_attr_numa_node(attr);
+	struct bpf_map_memory mem = {0};
 	struct bpf_queue_stack *qs;
 	u64 size, queue_size, cost;
 
@@ -77,19 +78,21 @@ static struct bpf_map *queue_stack_map_alloc(union bpf_attr *attr)
 
 	cost = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
 
-	ret = bpf_map_precharge_memlock(cost);
+	ret = bpf_map_charge_init(&mem, cost);
 	if (ret < 0)
 		return ERR_PTR(ret);
 
 	qs = bpf_map_area_alloc(queue_size, numa_node);
-	if (!qs)
+	if (!qs) {
+		bpf_map_charge_finish(&mem);
 		return ERR_PTR(-ENOMEM);
+	}
 
 	memset(qs, 0, sizeof(*qs));
 
 	bpf_map_init_from_attr(&qs->map, attr);
 
-	qs->map.memory.pages = cost;
+	bpf_map_charge_move(&qs->map.memory, &mem);
 	qs->size = size;
 
 	raw_spin_lock_init(&qs->lock);
diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c
index 819515242739..5c6e25b1b9b1 100644
--- a/kernel/bpf/reuseport_array.c
+++ b/kernel/bpf/reuseport_array.c
@@ -151,6 +151,7 @@ static struct bpf_map *reuseport_array_alloc(union bpf_attr *attr)
 {
 	int err, numa_node = bpf_map_attr_numa_node(attr);
 	struct reuseport_array *array;
+	struct bpf_map_memory mem;
 	u64 cost, array_size;
 
 	if (!capable(CAP_SYS_ADMIN))
@@ -165,18 +166,20 @@ static struct bpf_map *reuseport_array_alloc(union bpf_attr *attr)
 		return ERR_PTR(-ENOMEM);
 	cost = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
 
-	err = bpf_map_precharge_memlock(cost);
+	err = bpf_map_charge_init(&mem, cost);
 	if (err)
 		return ERR_PTR(err);
 
 	/* allocate all map elements and zero-initialize them */
 	array = bpf_map_area_alloc(array_size, numa_node);
-	if (!array)
+	if (!array) {
+		bpf_map_charge_finish(&mem);
 		return ERR_PTR(-ENOMEM);
+	}
 
 	/* copy mandatory map attributes */
 	bpf_map_init_from_attr(&array->map, attr);
-	array->map.memory.pages = cost;
+	bpf_map_charge_move(&array->map.memory, &mem);
 
 	return &array->map;
 }
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 08d4efff73ac..8da24ca65d97 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -89,6 +89,7 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr)
 {
 	u32 value_size = attr->value_size;
 	struct bpf_stack_map *smap;
+	struct bpf_map_memory mem;
 	u64 cost, n_buckets;
 	int err;
 
@@ -116,40 +117,43 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr)
 	n_buckets = roundup_pow_of_two(attr->max_entries);
 
 	cost = n_buckets * sizeof(struct stack_map_bucket *) + sizeof(*smap);
+	if (cost >= U32_MAX - PAGE_SIZE)
+		return ERR_PTR(-E2BIG);
+	cost += n_buckets * (value_size + sizeof(struct stack_map_bucket));
 	if (cost >= U32_MAX - PAGE_SIZE)
 		return ERR_PTR(-E2BIG);
 
+	err = bpf_map_charge_init(&mem,
+				  round_up(cost, PAGE_SIZE) >> PAGE_SHIFT);
+	if (err)
+		return ERR_PTR(err);
+
 	smap = bpf_map_area_alloc(cost, bpf_map_attr_numa_node(attr));
-	if (!smap)
+	if (!smap) {
+		bpf_map_charge_finish(&mem);
 		return ERR_PTR(-ENOMEM);
-
-	err = -E2BIG;
-	cost += n_buckets * (value_size + sizeof(struct stack_map_bucket));
-	if (cost >= U32_MAX - PAGE_SIZE)
-		goto free_smap;
+	}
 
 	bpf_map_init_from_attr(&smap->map, attr);
 	smap->map.value_size = value_size;
 	smap->n_buckets = n_buckets;
-	smap->map.memory.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
-
-	err = bpf_map_precharge_memlock(smap->map.memory.pages);
-	if (err)
-		goto free_smap;
 
 	err = get_callchain_buffers(sysctl_perf_event_max_stack);
 	if (err)
-		goto free_smap;
+		goto free_charge;
 
 	err = prealloc_elems_and_freelist(smap);
 	if (err)
 		goto put_buffers;
 
+	bpf_map_charge_move(&smap->map.memory, &mem);
+
 	return &smap->map;
 
 put_buffers:
 	put_callchain_buffers();
-free_smap:
+free_charge:
+	bpf_map_charge_finish(&mem);
 	bpf_map_area_free(smap);
 	return ERR_PTR(err);
 }
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index df14e63806c8..351cc434c4ad 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -188,19 +188,6 @@ void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr)
 	map->numa_node = bpf_map_attr_numa_node(attr);
 }
 
-int bpf_map_precharge_memlock(u32 pages)
-{
-	struct user_struct *user = get_current_user();
-	unsigned long memlock_limit, cur;
-
-	memlock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
-	cur = atomic_long_read(&user->locked_vm);
-	free_uid(user);
-	if (cur + pages > memlock_limit)
-		return -EPERM;
-	return 0;
-}
-
 static int bpf_charge_memlock(struct user_struct *user, u32 pages)
 {
 	unsigned long memlock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
@@ -214,29 +201,40 @@ static int bpf_charge_memlock(struct user_struct *user, u32 pages)
 
 static void bpf_uncharge_memlock(struct user_struct *user, u32 pages)
 {
-	atomic_long_sub(pages, &user->locked_vm);
+	if (user)
+		atomic_long_sub(pages, &user->locked_vm);
 }
 
-static int bpf_map_init_memlock(struct bpf_map *map)
+int bpf_map_charge_init(struct bpf_map_memory *mem, u32 pages)
 {
 	struct user_struct *user = get_current_user();
 	int ret;
 
-	ret = bpf_charge_memlock(user, map->memory.pages);
+	ret = bpf_charge_memlock(user, pages);
 	if (ret) {
 		free_uid(user);
 		return ret;
 	}
-	map->memory.user = user;
-	return ret;
+
+	mem->pages = pages;
+	mem->user = user;
+
+	return 0;
 }
 
-static void bpf_map_release_memlock(struct bpf_map *map)
+void bpf_map_charge_finish(struct bpf_map_memory *mem)
 {
-	struct user_struct *user = map->memory.user;
+	bpf_uncharge_memlock(mem->user, mem->pages);
+	free_uid(mem->user);
+}
 
-	bpf_uncharge_memlock(user, map->memory.pages);
-	free_uid(user);
+void bpf_map_charge_move(struct bpf_map_memory *dst,
+			 struct bpf_map_memory *src)
+{
+	*dst = *src;
+
+	/* Make sure src will not be used for the redundant uncharging. */
+	memset(src, 0, sizeof(struct bpf_map_memory));
 }
 
 int bpf_map_charge_memlock(struct bpf_map *map, u32 pages)
@@ -304,11 +302,13 @@ void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock)
 static void bpf_map_free_deferred(struct work_struct *work)
 {
 	struct bpf_map *map = container_of(work, struct bpf_map, work);
+	struct bpf_map_memory mem;
 
-	bpf_map_release_memlock(map);
+	bpf_map_charge_move(&mem, &map->memory);
 	security_bpf_map_free(map);
 	/* implementation dependent freeing */
 	map->ops->map_free(map);
+	bpf_map_charge_finish(&mem);
 }
 
 static void bpf_map_put_uref(struct bpf_map *map)
@@ -550,6 +550,7 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 static int map_create(union bpf_attr *attr)
 {
 	int numa_node = bpf_map_attr_numa_node(attr);
+	struct bpf_map_memory mem;
 	struct bpf_map *map;
 	int f_flags;
 	int err;
@@ -574,7 +575,7 @@ static int map_create(union bpf_attr *attr)
 
 	err = bpf_obj_name_cpy(map->name, attr->map_name);
 	if (err)
-		goto free_map_nouncharge;
+		goto free_map;
 
 	atomic_set(&map->refcnt, 1);
 	atomic_set(&map->usercnt, 1);
@@ -584,20 +585,20 @@ static int map_create(union bpf_attr *attr)
 
 		if (!attr->btf_value_type_id) {
 			err = -EINVAL;
-			goto free_map_nouncharge;
+			goto free_map;
 		}
 
 		btf = btf_get_by_fd(attr->btf_fd);
 		if (IS_ERR(btf)) {
 			err = PTR_ERR(btf);
-			goto free_map_nouncharge;
+			goto free_map;
 		}
 
 		err = map_check_btf(map, btf, attr->btf_key_type_id,
 				    attr->btf_value_type_id);
 		if (err) {
 			btf_put(btf);
-			goto free_map_nouncharge;
+			goto free_map;
 		}
 
 		map->btf = btf;
@@ -609,15 +610,11 @@ static int map_create(union bpf_attr *attr)
 
 	err = security_bpf_map_alloc(map);
 	if (err)
-		goto free_map_nouncharge;
-
-	err = bpf_map_init_memlock(map);
-	if (err)
-		goto free_map_sec;
+		goto free_map;
 
 	err = bpf_map_alloc_id(map);
 	if (err)
-		goto free_map;
+		goto free_map_sec;
 
 	err = bpf_map_new_fd(map, f_flags);
 	if (err < 0) {
@@ -633,13 +630,13 @@ static int map_create(union bpf_attr *attr)
 
 	return err;
 
-free_map:
-	bpf_map_release_memlock(map);
 free_map_sec:
 	security_bpf_map_free(map);
-free_map_nouncharge:
+free_map:
 	btf_put(map->btf);
+	bpf_map_charge_move(&mem, &map->memory);
 	map->ops->map_free(map);
+	bpf_map_charge_finish(&mem);
 	return err;
 }
 
diff --git a/kernel/bpf/xskmap.c b/kernel/bpf/xskmap.c
index f816ee1a0fa0..a329dab7c7a4 100644
--- a/kernel/bpf/xskmap.c
+++ b/kernel/bpf/xskmap.c
@@ -40,10 +40,9 @@ static struct bpf_map *xsk_map_alloc(union bpf_attr *attr)
 	if (cost >= U32_MAX - PAGE_SIZE)
 		goto free_m;
 
-	m->map.memory.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
-
 	/* Notice returns -EPERM on if map size is larger than memlock limit */
-	err = bpf_map_precharge_memlock(m->map.memory.pages);
+	err = bpf_map_charge_init(&m->map.memory,
+				  round_up(cost, PAGE_SIZE) >> PAGE_SHIFT);
 	if (err)
 		goto free_m;
 
@@ -51,7 +50,7 @@ static struct bpf_map *xsk_map_alloc(union bpf_attr *attr)
 
 	m->flush_list = alloc_percpu(struct list_head);
 	if (!m->flush_list)
-		goto free_m;
+		goto free_charge;
 
 	for_each_possible_cpu(cpu)
 		INIT_LIST_HEAD(per_cpu_ptr(m->flush_list, cpu));
@@ -65,6 +64,8 @@ static struct bpf_map *xsk_map_alloc(union bpf_attr *attr)
 
 free_percpu:
 	free_percpu(m->flush_list);
+free_charge:
+	bpf_map_charge_finish(&m->map.memory);
 free_m:
 	kfree(m);
 	return ERR_PTR(err);
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index 92581c3ff220..621a0b07ff11 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -640,13 +640,16 @@ static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr)
 	cost = sizeof(*smap->buckets) * nbuckets + sizeof(*smap);
 	pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
 
-	ret = bpf_map_precharge_memlock(pages);
-	if (ret < 0)
+	ret = bpf_map_charge_init(&smap->map.memory, pages);
+	if (ret < 0) {
+		kfree(smap);
 		return ERR_PTR(ret);
+	}
 
 	smap->buckets = kvcalloc(sizeof(*smap->buckets), nbuckets,
 				 GFP_USER | __GFP_NOWARN);
 	if (!smap->buckets) {
+		bpf_map_charge_finish(&smap->map.memory);
 		kfree(smap);
 		return ERR_PTR(-ENOMEM);
 	}
@@ -659,7 +662,6 @@ static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr)
 	smap->elem_size = sizeof(struct bpf_sk_storage_elem) + attr->value_size;
 	smap->cache_idx = (unsigned int)atomic_inc_return(&cache_idx) %
 		BPF_SK_STORAGE_CACHE_SIZE;
-	smap->map.memory.pages = pages;
 
 	return &smap->map;
 }
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 4eb5b6a1b29f..1028c922a149 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -49,8 +49,8 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
 		goto free_stab;
 	}
 
-	stab->map.memory.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
-	err = bpf_map_precharge_memlock(stab->map.memory.pages);
+	err = bpf_map_charge_init(&stab->map.memory,
+				  round_up(cost, PAGE_SIZE) >> PAGE_SHIFT);
 	if (err)
 		goto free_stab;
 
@@ -60,6 +60,7 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
 	if (stab->sks)
 		return &stab->map;
 	err = -ENOMEM;
+	bpf_map_charge_finish(&stab->map.memory);
 free_stab:
 	kfree(stab);
 	return ERR_PTR(err);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH bpf-next 5/5] bpf: move memory size checks to bpf_map_charge_init()
  2019-05-30  1:03 [PATCH bpf-next 0/5] bpf: bpf maps memory accounting cleanup Roman Gushchin
                   ` (3 preceding siblings ...)
  2019-05-30  1:03 ` [PATCH bpf-next 4/5] bpf: rework memlock-based memory accounting for maps Roman Gushchin
@ 2019-05-30  1:03 ` Roman Gushchin
  2019-05-30 18:56   ` Song Liu
  2019-06-01  0:00 ` [PATCH bpf-next 0/5] bpf: bpf maps memory accounting cleanup Alexei Starovoitov
  5 siblings, 1 reply; 13+ messages in thread
From: Roman Gushchin @ 2019-05-30  1:03 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, bpf
  Cc: kernel-team, linux-kernel, Roman Gushchin

Most bpf map types doing similar checks and bytes to pages
conversion during memory allocation and charging.

Let's unify these checks by moving them into bpf_map_charge_init().

Signed-off-by: Roman Gushchin <guro@fb.com>
---
 include/linux/bpf.h           |  2 +-
 kernel/bpf/arraymap.c         |  8 +-------
 kernel/bpf/cpumap.c           |  5 +----
 kernel/bpf/devmap.c           |  5 +----
 kernel/bpf/hashtab.c          |  7 +------
 kernel/bpf/local_storage.c    |  5 +----
 kernel/bpf/lpm_trie.c         |  7 +------
 kernel/bpf/queue_stack_maps.c |  4 ----
 kernel/bpf/reuseport_array.c  | 10 ++--------
 kernel/bpf/stackmap.c         |  8 +-------
 kernel/bpf/syscall.c          |  9 +++++++--
 kernel/bpf/xskmap.c           |  5 +----
 net/core/bpf_sk_storage.c     |  4 +---
 net/core/sock_map.c           |  8 +-------
 14 files changed, 20 insertions(+), 67 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 6187203b0414..3997c0038062 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -602,7 +602,7 @@ void bpf_map_put_with_uref(struct bpf_map *map);
 void bpf_map_put(struct bpf_map *map);
 int bpf_map_charge_memlock(struct bpf_map *map, u32 pages);
 void bpf_map_uncharge_memlock(struct bpf_map *map, u32 pages);
-int bpf_map_charge_init(struct bpf_map_memory *mem, u32 pages);
+int bpf_map_charge_init(struct bpf_map_memory *mem, size_t size);
 void bpf_map_charge_finish(struct bpf_map_memory *mem);
 void bpf_map_charge_move(struct bpf_map_memory *dst,
 			 struct bpf_map_memory *src);
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 3552da4407d9..0349cbf23cdb 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -117,14 +117,8 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
 
 	/* make sure there is no u32 overflow later in round_up() */
 	cost = array_size;
-	if (cost >= U32_MAX - PAGE_SIZE)
-		return ERR_PTR(-ENOMEM);
-	if (percpu) {
+	if (percpu)
 		cost += (u64)attr->max_entries * elem_size * num_possible_cpus();
-		if (cost >= U32_MAX - PAGE_SIZE)
-			return ERR_PTR(-ENOMEM);
-	}
-	cost = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
 
 	ret = bpf_map_charge_init(&mem, cost);
 	if (ret < 0)
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index c633c8d68023..b31a71909307 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -106,12 +106,9 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
 	/* make sure page count doesn't overflow */
 	cost = (u64) cmap->map.max_entries * sizeof(struct bpf_cpu_map_entry *);
 	cost += cpu_map_bitmap_size(attr) * num_possible_cpus();
-	if (cost >= U32_MAX - PAGE_SIZE)
-		goto free_cmap;
 
 	/* Notice returns -EPERM on if map size is larger than memlock limit */
-	ret = bpf_map_charge_init(&cmap->map.memory,
-				  round_up(cost, PAGE_SIZE) >> PAGE_SHIFT);
+	ret = bpf_map_charge_init(&cmap->map.memory, cost);
 	if (ret) {
 		err = ret;
 		goto free_cmap;
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 371bd880ed58..5ae7cce5ef16 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -108,12 +108,9 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
 	/* make sure page count doesn't overflow */
 	cost = (u64) dtab->map.max_entries * sizeof(struct bpf_dtab_netdev *);
 	cost += dev_map_bitmap_size(attr) * num_possible_cpus();
-	if (cost >= U32_MAX - PAGE_SIZE)
-		goto free_dtab;
 
 	/* if map size is larger than memlock limit, reject it */
-	err = bpf_map_charge_init(&dtab->map.memory,
-				  round_up(cost, PAGE_SIZE) >> PAGE_SHIFT);
+	err = bpf_map_charge_init(&dtab->map.memory, cost);
 	if (err)
 		goto free_dtab;
 
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index b0bdc7b040ad..d92e05d9979b 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -360,13 +360,8 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 	else
 	       cost += (u64) htab->elem_size * num_possible_cpus();
 
-	if (cost >= U32_MAX - PAGE_SIZE)
-		/* make sure page count doesn't overflow */
-		goto free_htab;
-
 	/* if map size is larger than memlock limit, reject it */
-	err = bpf_map_charge_init(&htab->map.memory,
-				  round_up(cost, PAGE_SIZE) >> PAGE_SHIFT);
+	err = bpf_map_charge_init(&htab->map.memory, cost);
 	if (err)
 		goto free_htab;
 
diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
index e49bfd4f4f6d..addd6fdceec8 100644
--- a/kernel/bpf/local_storage.c
+++ b/kernel/bpf/local_storage.c
@@ -273,7 +273,6 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
 	int numa_node = bpf_map_attr_numa_node(attr);
 	struct bpf_cgroup_storage_map *map;
 	struct bpf_map_memory mem;
-	u32 pages;
 	int ret;
 
 	if (attr->key_size != sizeof(struct bpf_cgroup_storage_key))
@@ -293,9 +292,7 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
 		/* max_entries is not used and enforced to be 0 */
 		return ERR_PTR(-EINVAL);
 
-	pages = round_up(sizeof(struct bpf_cgroup_storage_map), PAGE_SIZE) >>
-		PAGE_SHIFT;
-	ret = bpf_map_charge_init(&mem, pages);
+	ret = bpf_map_charge_init(&mem, sizeof(struct bpf_cgroup_storage_map));
 	if (ret < 0)
 		return ERR_PTR(ret);
 
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index 6345a8d2dcd0..09334f13a8a0 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -573,13 +573,8 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr)
 	cost_per_node = sizeof(struct lpm_trie_node) +
 			attr->value_size + trie->data_size;
 	cost += (u64) attr->max_entries * cost_per_node;
-	if (cost >= U32_MAX - PAGE_SIZE) {
-		ret = -E2BIG;
-		goto out_err;
-	}
 
-	ret = bpf_map_charge_init(&trie->map.memory,
-				  round_up(cost, PAGE_SIZE) >> PAGE_SHIFT);
+	ret = bpf_map_charge_init(&trie->map.memory, cost);
 	if (ret)
 		goto out_err;
 
diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c
index 224cb0fd8f03..f697647ceb54 100644
--- a/kernel/bpf/queue_stack_maps.c
+++ b/kernel/bpf/queue_stack_maps.c
@@ -73,10 +73,6 @@ static struct bpf_map *queue_stack_map_alloc(union bpf_attr *attr)
 
 	size = (u64) attr->max_entries + 1;
 	cost = queue_size = sizeof(*qs) + size * attr->value_size;
-	if (cost >= U32_MAX - PAGE_SIZE)
-		return ERR_PTR(-E2BIG);
-
-	cost = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
 
 	ret = bpf_map_charge_init(&mem, cost);
 	if (ret < 0)
diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c
index 5c6e25b1b9b1..50c083ba978c 100644
--- a/kernel/bpf/reuseport_array.c
+++ b/kernel/bpf/reuseport_array.c
@@ -152,7 +152,7 @@ static struct bpf_map *reuseport_array_alloc(union bpf_attr *attr)
 	int err, numa_node = bpf_map_attr_numa_node(attr);
 	struct reuseport_array *array;
 	struct bpf_map_memory mem;
-	u64 cost, array_size;
+	u64 array_size;
 
 	if (!capable(CAP_SYS_ADMIN))
 		return ERR_PTR(-EPERM);
@@ -160,13 +160,7 @@ static struct bpf_map *reuseport_array_alloc(union bpf_attr *attr)
 	array_size = sizeof(*array);
 	array_size += (u64)attr->max_entries * sizeof(struct sock *);
 
-	/* make sure there is no u32 overflow later in round_up() */
-	cost = array_size;
-	if (cost >= U32_MAX - PAGE_SIZE)
-		return ERR_PTR(-ENOMEM);
-	cost = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
-
-	err = bpf_map_charge_init(&mem, cost);
+	err = bpf_map_charge_init(&mem, array_size);
 	if (err)
 		return ERR_PTR(err);
 
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 8da24ca65d97..3d86072d8e32 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -117,14 +117,8 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr)
 	n_buckets = roundup_pow_of_two(attr->max_entries);
 
 	cost = n_buckets * sizeof(struct stack_map_bucket *) + sizeof(*smap);
-	if (cost >= U32_MAX - PAGE_SIZE)
-		return ERR_PTR(-E2BIG);
 	cost += n_buckets * (value_size + sizeof(struct stack_map_bucket));
-	if (cost >= U32_MAX - PAGE_SIZE)
-		return ERR_PTR(-E2BIG);
-
-	err = bpf_map_charge_init(&mem,
-				  round_up(cost, PAGE_SIZE) >> PAGE_SHIFT);
+	err = bpf_map_charge_init(&mem, cost);
 	if (err)
 		return ERR_PTR(err);
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 351cc434c4ad..b3e83712b982 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -205,11 +205,16 @@ static void bpf_uncharge_memlock(struct user_struct *user, u32 pages)
 		atomic_long_sub(pages, &user->locked_vm);
 }
 
-int bpf_map_charge_init(struct bpf_map_memory *mem, u32 pages)
+int bpf_map_charge_init(struct bpf_map_memory *mem, size_t size)
 {
-	struct user_struct *user = get_current_user();
+	u32 pages = round_up(size, PAGE_SIZE) >> PAGE_SHIFT;
+	struct user_struct *user;
 	int ret;
 
+	if (size >= U32_MAX - PAGE_SIZE)
+		return -E2BIG;
+
+	user = get_current_user();
 	ret = bpf_charge_memlock(user, pages);
 	if (ret) {
 		free_uid(user);
diff --git a/kernel/bpf/xskmap.c b/kernel/bpf/xskmap.c
index a329dab7c7a4..22066c28ba61 100644
--- a/kernel/bpf/xskmap.c
+++ b/kernel/bpf/xskmap.c
@@ -37,12 +37,9 @@ static struct bpf_map *xsk_map_alloc(union bpf_attr *attr)
 
 	cost = (u64)m->map.max_entries * sizeof(struct xdp_sock *);
 	cost += sizeof(struct list_head) * num_possible_cpus();
-	if (cost >= U32_MAX - PAGE_SIZE)
-		goto free_m;
 
 	/* Notice returns -EPERM on if map size is larger than memlock limit */
-	err = bpf_map_charge_init(&m->map.memory,
-				  round_up(cost, PAGE_SIZE) >> PAGE_SHIFT);
+	err = bpf_map_charge_init(&m->map.memory, cost);
 	if (err)
 		goto free_m;
 
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index 621a0b07ff11..f40e3d35fd9c 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -626,7 +626,6 @@ static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr)
 	struct bpf_sk_storage_map *smap;
 	unsigned int i;
 	u32 nbuckets;
-	u32 pages;
 	u64 cost;
 	int ret;
 
@@ -638,9 +637,8 @@ static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr)
 	smap->bucket_log = ilog2(roundup_pow_of_two(num_possible_cpus()));
 	nbuckets = 1U << smap->bucket_log;
 	cost = sizeof(*smap->buckets) * nbuckets + sizeof(*smap);
-	pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
 
-	ret = bpf_map_charge_init(&smap->map.memory, pages);
+	ret = bpf_map_charge_init(&smap->map.memory, cost);
 	if (ret < 0) {
 		kfree(smap);
 		return ERR_PTR(ret);
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 1028c922a149..52d4faeee18b 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -44,13 +44,7 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
 
 	/* Make sure page count doesn't overflow. */
 	cost = (u64) stab->map.max_entries * sizeof(struct sock *);
-	if (cost >= U32_MAX - PAGE_SIZE) {
-		err = -EINVAL;
-		goto free_stab;
-	}
-
-	err = bpf_map_charge_init(&stab->map.memory,
-				  round_up(cost, PAGE_SIZE) >> PAGE_SHIFT);
+	err = bpf_map_charge_init(&stab->map.memory, cost);
 	if (err)
 		goto free_stab;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next 1/5] bpf: add memlock precharge check for cgroup_local_storage
  2019-05-30  1:03 ` [PATCH bpf-next 1/5] bpf: add memlock precharge check for cgroup_local_storage Roman Gushchin
@ 2019-05-30 18:26   ` Song Liu
  0 siblings, 0 replies; 13+ messages in thread
From: Song Liu @ 2019-05-30 18:26 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Alexei Starovoitov, Daniel Borkmann, bpf, Kernel Team, open list

On Wed, May 29, 2019 at 6:05 PM Roman Gushchin <guro@fb.com> wrote:
>
> Cgroup local storage maps lack the memlock precharge check,
> which is performed before the memory allocation for
> most other bpf map types.
>
> Let's add it in order to unify all map types.
>
> Signed-off-by: Roman Gushchin <guro@fb.com>

Acked-by: Song Liu <songliubraving@fb.com>

> ---
>  kernel/bpf/local_storage.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
> index 980e8f1f6cb5..e48302ecb389 100644
> --- a/kernel/bpf/local_storage.c
> +++ b/kernel/bpf/local_storage.c
> @@ -272,6 +272,8 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
>  {
>         int numa_node = bpf_map_attr_numa_node(attr);
>         struct bpf_cgroup_storage_map *map;
> +       u32 pages;
> +       int ret;
>
>         if (attr->key_size != sizeof(struct bpf_cgroup_storage_key))
>                 return ERR_PTR(-EINVAL);
> @@ -290,13 +292,18 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
>                 /* max_entries is not used and enforced to be 0 */
>                 return ERR_PTR(-EINVAL);
>
> +       pages = round_up(sizeof(struct bpf_cgroup_storage_map), PAGE_SIZE) >>
> +               PAGE_SHIFT;
> +       ret = bpf_map_precharge_memlock(pages);
> +       if (ret < 0)
> +               return ERR_PTR(ret);
> +
>         map = kmalloc_node(sizeof(struct bpf_cgroup_storage_map),
>                            __GFP_ZERO | GFP_USER, numa_node);
>         if (!map)
>                 return ERR_PTR(-ENOMEM);
>
> -       map->map.pages = round_up(sizeof(struct bpf_cgroup_storage_map),
> -                                 PAGE_SIZE) >> PAGE_SHIFT;
> +       map->map.pages = pages;
>
>         /* copy mandatory map attributes */
>         bpf_map_init_from_attr(&map->map, attr);
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next 2/5] bpf: add memlock precharge for socket local storage
  2019-05-30  1:03 ` [PATCH bpf-next 2/5] bpf: add memlock precharge for socket local storage Roman Gushchin
@ 2019-05-30 18:26   ` Song Liu
  0 siblings, 0 replies; 13+ messages in thread
From: Song Liu @ 2019-05-30 18:26 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Alexei Starovoitov, Daniel Borkmann, bpf, Kernel Team, open list

On Wed, May 29, 2019 at 6:05 PM Roman Gushchin <guro@fb.com> wrote:
>
> Socket local storage maps lack the memlock precharge check,
> which is performed before the memory allocation for
> most other bpf map types.
>
> Let's add it in order to unify all map types.
>
> Signed-off-by: Roman Gushchin <guro@fb.com>

Acked-by: Song Liu <songliubraving@fb.com>

> ---
>  net/core/bpf_sk_storage.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
> index cc9597a87770..9a8aaf8e235d 100644
> --- a/net/core/bpf_sk_storage.c
> +++ b/net/core/bpf_sk_storage.c
> @@ -626,7 +626,9 @@ static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr)
>         struct bpf_sk_storage_map *smap;
>         unsigned int i;
>         u32 nbuckets;
> +       u32 pages;
>         u64 cost;
> +       int ret;
>
>         smap = kzalloc(sizeof(*smap), GFP_USER | __GFP_NOWARN);
>         if (!smap)
> @@ -635,13 +637,19 @@ static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr)
>
>         smap->bucket_log = ilog2(roundup_pow_of_two(num_possible_cpus()));
>         nbuckets = 1U << smap->bucket_log;
> +       cost = sizeof(*smap->buckets) * nbuckets + sizeof(*smap);
> +       pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
> +
> +       ret = bpf_map_precharge_memlock(pages);
> +       if (ret < 0)
> +               return ERR_PTR(ret);
> +
>         smap->buckets = kvcalloc(sizeof(*smap->buckets), nbuckets,
>                                  GFP_USER | __GFP_NOWARN);
>         if (!smap->buckets) {
>                 kfree(smap);
>                 return ERR_PTR(-ENOMEM);
>         }
> -       cost = sizeof(*smap->buckets) * nbuckets + sizeof(*smap);
>
>         for (i = 0; i < nbuckets; i++) {
>                 INIT_HLIST_HEAD(&smap->buckets[i].list);
> @@ -651,7 +659,7 @@ static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr)
>         smap->elem_size = sizeof(struct bpf_sk_storage_elem) + attr->value_size;
>         smap->cache_idx = (unsigned int)atomic_inc_return(&cache_idx) %
>                 BPF_SK_STORAGE_CACHE_SIZE;
> -       smap->map.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
> +       smap->map.pages = pages;
>
>         return &smap->map;
>  }
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next 4/5] bpf: rework memlock-based memory accounting for maps
  2019-05-30  1:03 ` [PATCH bpf-next 4/5] bpf: rework memlock-based memory accounting for maps Roman Gushchin
@ 2019-05-30 18:52   ` Song Liu
  0 siblings, 0 replies; 13+ messages in thread
From: Song Liu @ 2019-05-30 18:52 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Alexei Starovoitov, Daniel Borkmann, bpf, Kernel Team, open list

On Wed, May 29, 2019 at 6:05 PM Roman Gushchin <guro@fb.com> wrote:
>
> In order to unify the existing memlock charging code with the
> memcg-based memory accounting, which will be added later, let's
> rework the current scheme.
>
> Currently the following design is used:
>   1) .alloc() callback optionally checks if the allocation will likely
>      succeed using bpf_map_precharge_memlock()
>   2) .alloc() performs actual allocations
>   3) .alloc() callback calculates map cost and sets map.memory.pages
>   4) map_create() calls bpf_map_init_memlock() which sets map.memory.user
>      and performs actual charging; in case of failure the map is
>      destroyed
>   <map is in use>
>   1) bpf_map_free_deferred() calls bpf_map_release_memlock(), which
>      performs uncharge and releases the user
>   2) .map_free() callback releases the memory
>
> The scheme can be simplified and made more robust:
>   1) .alloc() calculates map cost and calls bpf_map_charge_init()
>   2) bpf_map_charge_init() sets map.memory.user and performs actual
>     charge
>   3) .alloc() performs actual allocations
>   <map is in use>
>   1) .map_free() callback releases the memory
>   2) bpf_map_charge_finish() performs uncharge and releases the user
>
> The new scheme also allows to reuse bpf_map_charge_init()/finish()
> functions for memcg-based accounting. Because charges are performed
> before actual allocations and uncharges after freeing the memory,
> no bogus memory pressure can be created.
>
> In cases when the map structure is not available (e.g. it's not
> created yet, or is already destroyed), on-stack bpf_map_memory
> structure is used. The charge can be transferred with the
> bpf_map_charge_move() function.
>
> Signed-off-by: Roman Gushchin <guro@fb.com>

Looks good.

Acked-by: Song Liu <songliubraving@fb.com>

> ---
>  include/linux/bpf.h           |  5 ++-
>  kernel/bpf/arraymap.c         | 10 +++--
>  kernel/bpf/cpumap.c           |  8 ++--
>  kernel/bpf/devmap.c           | 13 ++++---
>  kernel/bpf/hashtab.c          | 11 +++---
>  kernel/bpf/local_storage.c    |  9 +++--
>  kernel/bpf/lpm_trie.c         |  5 +--
>  kernel/bpf/queue_stack_maps.c |  9 +++--
>  kernel/bpf/reuseport_array.c  |  9 +++--
>  kernel/bpf/stackmap.c         | 30 ++++++++-------
>  kernel/bpf/syscall.c          | 69 +++++++++++++++++------------------
>  kernel/bpf/xskmap.c           |  9 +++--
>  net/core/bpf_sk_storage.c     |  8 ++--
>  net/core/sock_map.c           |  5 ++-
>  14 files changed, 112 insertions(+), 88 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 980b7a9bdd21..6187203b0414 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -600,9 +600,12 @@ struct bpf_map *__bpf_map_get(struct fd f);
>  struct bpf_map * __must_check bpf_map_inc(struct bpf_map *map, bool uref);
>  void bpf_map_put_with_uref(struct bpf_map *map);
>  void bpf_map_put(struct bpf_map *map);
> -int bpf_map_precharge_memlock(u32 pages);
>  int bpf_map_charge_memlock(struct bpf_map *map, u32 pages);
>  void bpf_map_uncharge_memlock(struct bpf_map *map, u32 pages);
> +int bpf_map_charge_init(struct bpf_map_memory *mem, u32 pages);
> +void bpf_map_charge_finish(struct bpf_map_memory *mem);
> +void bpf_map_charge_move(struct bpf_map_memory *dst,
> +                        struct bpf_map_memory *src);
>  void *bpf_map_area_alloc(size_t size, int numa_node);
>  void bpf_map_area_free(void *base);
>  void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr);
> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> index 8fda24e78193..3552da4407d9 100644
> --- a/kernel/bpf/arraymap.c
> +++ b/kernel/bpf/arraymap.c
> @@ -83,6 +83,7 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
>         u32 elem_size, index_mask, max_entries;
>         bool unpriv = !capable(CAP_SYS_ADMIN);
>         u64 cost, array_size, mask64;
> +       struct bpf_map_memory mem;
>         struct bpf_array *array;
>
>         elem_size = round_up(attr->value_size, 8);
> @@ -125,23 +126,26 @@ static struct bpf_map *r(union bpf_attr *attr)

>         }
>         cost = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
>
> -       ret = bpf_map_precharge_memlock(cost);
> +       ret = bpf_map_charge_init(&mem, cost);
>         if (ret < 0)
>                 return ERR_PTR(ret);
>
>         /* allocate all map elements and zero-initialize them */
>         array = bpf_map_area_alloc(array_size, numa_node);
> -       if (!array)
> +       if (!array) {
> +               bpf_map_charge_finish(&mem);
>                 return ERR_PTR(-ENOMEM);
> +       }
>         array->index_mask = index_mask;
>         array->map.unpriv_array = unpriv;
>
>         /* copy mandatory map attributes */
>         bpf_map_init_from_attr(&array->map, attr);
> -       array->map.memory.pages = cost;
> +       bpf_map_charge_move(&array->map.memory, &mem);
>         array->elem_size = elem_size;
>
>         if (percpu && bpf_array_alloc_percpu(array)) {
> +               bpf_map_charge_finish(&array->map.memory);
>                 bpf_map_area_free(array);
>                 return ERR_PTR(-ENOMEM);
>         }
> diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
> index 035268add724..c633c8d68023 100644
> --- a/kernel/bpf/cpumap.c
> +++ b/kernel/bpf/cpumap.c
> @@ -108,10 +108,10 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
>         cost += cpu_map_bitmap_size(attr) * num_possible_cpus();
>         if (cost >= U32_MAX - PAGE_SIZE)
>                 goto free_cmap;
> -       cmap->map.memory.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
>
>         /* Notice returns -EPERM on if map size is larger than memlock limit */
> -       ret = bpf_map_precharge_memlock(cmap->map.memory.pages);
> +       ret = bpf_map_charge_init(&cmap->map.memory,
> +                                 round_up(cost, PAGE_SIZE) >> PAGE_SHIFT);
>         if (ret) {
>                 err = ret;
>                 goto free_cmap;
> @@ -121,7 +121,7 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
>         cmap->flush_needed = __alloc_percpu(cpu_map_bitmap_size(attr),
>                                             __alignof__(unsigned long));
>         if (!cmap->flush_needed)
> -               goto free_cmap;
> +               goto free_charge;
>
>         /* Alloc array for possible remote "destination" CPUs */
>         cmap->cpu_map = bpf_map_area_alloc(cmap->map.max_entries *
> @@ -133,6 +133,8 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
>         return &cmap->map;
>  free_percpu:
>         free_percpu(cmap->flush_needed);
> +free_charge:
> +       bpf_map_charge_finish(&cmap->map.memory);
>  free_cmap:
>         kfree(cmap);
>         return ERR_PTR(err);
> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> index f6c57efb1d0d..371bd880ed58 100644
> --- a/kernel/bpf/devmap.c
> +++ b/kernel/bpf/devmap.c
> @@ -111,10 +111,9 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
>         if (cost >= U32_MAX - PAGE_SIZE)
>                 goto free_dtab;
>
> -       dtab->map.memory.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
> -
> -       /* if map size is larger than memlock limit, reject it early */
> -       err = bpf_map_precharge_memlock(dtab->map.memory.pages);
> +       /* if map size is larger than memlock limit, reject it */
> +       err = bpf_map_charge_init(&dtab->map.memory,
> +                                 round_up(cost, PAGE_SIZE) >> PAGE_SHIFT);
>         if (err)
>                 goto free_dtab;
>
> @@ -125,19 +124,21 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
>                                                 __alignof__(unsigned long),
>                                                 GFP_KERNEL | __GFP_NOWARN);
>         if (!dtab->flush_needed)
> -               goto free_dtab;
> +               goto free_charge;
>
>         dtab->netdev_map = bpf_map_area_alloc(dtab->map.max_entries *
>                                               sizeof(struct bpf_dtab_netdev *),
>                                               dtab->map.numa_node);
>         if (!dtab->netdev_map)
> -               goto free_dtab;
> +               goto free_charge;
>
>         spin_lock(&dev_map_lock);
>         list_add_tail_rcu(&dtab->list, &dev_map_list);
>         spin_unlock(&dev_map_lock);
>
>         return &dtab->map;
> +free_charge:
> +       bpf_map_charge_finish(&dtab->map.memory);
>  free_dtab:
>         free_percpu(dtab->flush_needed);
>         kfree(dtab);
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 15bf228d2e98..b0bdc7b040ad 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -364,10 +364,9 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
>                 /* make sure page count doesn't overflow */
>                 goto free_htab;
>
> -       htab->map.memory.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
> -
> -       /* if map size is larger than memlock limit, reject it early */
> -       err = bpf_map_precharge_memlock(htab->map.memory.pages);
> +       /* if map size is larger than memlock limit, reject it */
> +       err = bpf_map_charge_init(&htab->map.memory,
> +                                 round_up(cost, PAGE_SIZE) >> PAGE_SHIFT);
>         if (err)
>                 goto free_htab;
>
> @@ -376,7 +375,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
>                                            sizeof(struct bucket),
>                                            htab->map.numa_node);
>         if (!htab->buckets)
> -               goto free_htab;
> +               goto free_charge;
>
>         if (htab->map.map_flags & BPF_F_ZERO_SEED)
>                 htab->hashrnd = 0;
> @@ -409,6 +408,8 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
>         prealloc_destroy(htab);
>  free_buckets:
>         bpf_map_area_free(htab->buckets);
> +free_charge:
> +       bpf_map_charge_finish(&htab->map.memory);
>  free_htab:
>         kfree(htab);
>         return ERR_PTR(err);
> diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
> index 574325276650..e49bfd4f4f6d 100644
> --- a/kernel/bpf/local_storage.c
> +++ b/kernel/bpf/local_storage.c
> @@ -272,6 +272,7 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
>  {
>         int numa_node = bpf_map_attr_numa_node(attr);
>         struct bpf_cgroup_storage_map *map;
> +       struct bpf_map_memory mem;
>         u32 pages;
>         int ret;
>
> @@ -294,16 +295,18 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
>
>         pages = round_up(sizeof(struct bpf_cgroup_storage_map), PAGE_SIZE) >>
>                 PAGE_SHIFT;
> -       ret = bpf_map_precharge_memlock(pages);
> +       ret = bpf_map_charge_init(&mem, pages);
>         if (ret < 0)
>                 return ERR_PTR(ret);
>
>         map = kmalloc_node(sizeof(struct bpf_cgroup_storage_map),
>                            __GFP_ZERO | GFP_USER, numa_node);
> -       if (!map)
> +       if (!map) {
> +               bpf_map_charge_finish(&mem);
>                 return ERR_PTR(-ENOMEM);
> +       }
>
> -       map->map.memory.pages = pages;
> +       bpf_map_charge_move(&map->map.memory, &mem);
>
>         /* copy mandatory map attributes */
>         bpf_map_init_from_attr(&map->map, attr);
> diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
> index 8e423a582760..6345a8d2dcd0 100644
> --- a/kernel/bpf/lpm_trie.c
> +++ b/kernel/bpf/lpm_trie.c
> @@ -578,9 +578,8 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr)
>                 goto out_err;
>         }
>
> -       trie->map.memory.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
> -
> -       ret = bpf_map_precharge_memlock(trie->map.memory.pages);
> +       ret = bpf_map_charge_init(&trie->map.memory,
> +                                 round_up(cost, PAGE_SIZE) >> PAGE_SHIFT);
>         if (ret)
>                 goto out_err;
>
> diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c
> index 8a510e71d486..224cb0fd8f03 100644
> --- a/kernel/bpf/queue_stack_maps.c
> +++ b/kernel/bpf/queue_stack_maps.c
> @@ -67,6 +67,7 @@ static int queue_stack_map_alloc_check(union bpf_attr *attr)
>  static struct bpf_map *queue_stack_map_alloc(union bpf_attr *attr)
>  {
>         int ret, numa_node = bpf_map_attr_numa_node(attr);
> +       struct bpf_map_memory mem = {0};
>         struct bpf_queue_stack *qs;
>         u64 size, queue_size, cost;
>
> @@ -77,19 +78,21 @@ static struct bpf_map *queue_stack_map_alloc(union bpf_attr *attr)
>
>         cost = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
>
> -       ret = bpf_map_precharge_memlock(cost);
> +       ret = bpf_map_charge_init(&mem, cost);
>         if (ret < 0)
>                 return ERR_PTR(ret);
>
>         qs = bpf_map_area_alloc(queue_size, numa_node);
> -       if (!qs)
> +       if (!qs) {
> +               bpf_map_charge_finish(&mem);
>                 return ERR_PTR(-ENOMEM);
> +       }
>
>         memset(qs, 0, sizeof(*qs));
>
>         bpf_map_init_from_attr(&qs->map, attr);
>
> -       qs->map.memory.pages = cost;
> +       bpf_map_charge_move(&qs->map.memory, &mem);
>         qs->size = size;
>
>         raw_spin_lock_init(&qs->lock);
> diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c
> index 819515242739..5c6e25b1b9b1 100644
> --- a/kernel/bpf/reuseport_array.c
> +++ b/kernel/bpf/reuseport_array.c
> @@ -151,6 +151,7 @@ static struct bpf_map *reuseport_array_alloc(union bpf_attr *attr)
>  {
>         int err, numa_node = bpf_map_attr_numa_node(attr);
>         struct reuseport_array *array;
> +       struct bpf_map_memory mem;
>         u64 cost, array_size;
>
>         if (!capable(CAP_SYS_ADMIN))
> @@ -165,18 +166,20 @@ static struct bpf_map *reuseport_array_alloc(union bpf_attr *attr)
>                 return ERR_PTR(-ENOMEM);
>         cost = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
>
> -       err = bpf_map_precharge_memlock(cost);
> +       err = bpf_map_charge_init(&mem, cost);
>         if (err)
>                 return ERR_PTR(err);
>
>         /* allocate all map elements and zero-initialize them */
>         array = bpf_map_area_alloc(array_size, numa_node);
> -       if (!array)
> +       if (!array) {
> +               bpf_map_charge_finish(&mem);
>                 return ERR_PTR(-ENOMEM);
> +       }
>
>         /* copy mandatory map attributes */
>         bpf_map_init_from_attr(&array->map, attr);
> -       array->map.memory.pages = cost;
> +       bpf_map_charge_move(&array->map.memory, &mem);
>
>         return &array->map;
>  }
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index 08d4efff73ac..8da24ca65d97 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -89,6 +89,7 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr)
>  {
>         u32 value_size = attr->value_size;
>         struct bpf_stack_map *smap;
> +       struct bpf_map_memory mem;
>         u64 cost, n_buckets;
>         int err;
>
> @@ -116,40 +117,43 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr)
>         n_buckets = roundup_pow_of_two(attr->max_entries);
>
>         cost = n_buckets * sizeof(struct stack_map_bucket *) + sizeof(*smap);
> +       if (cost >= U32_MAX - PAGE_SIZE)
> +               return ERR_PTR(-E2BIG);
> +       cost += n_buckets * (value_size + sizeof(struct stack_map_bucket));
>         if (cost >= U32_MAX - PAGE_SIZE)
>                 return ERR_PTR(-E2BIG);
>
> +       err = bpf_map_charge_init(&mem,
> +                                 round_up(cost, PAGE_SIZE) >> PAGE_SHIFT);
> +       if (err)
> +               return ERR_PTR(err);
> +
>         smap = bpf_map_area_alloc(cost, bpf_map_attr_numa_node(attr));
> -       if (!smap)
> +       if (!smap) {
> +               bpf_map_charge_finish(&mem);
>                 return ERR_PTR(-ENOMEM);
> -
> -       err = -E2BIG;
> -       cost += n_buckets * (value_size + sizeof(struct stack_map_bucket));
> -       if (cost >= U32_MAX - PAGE_SIZE)
> -               goto free_smap;
> +       }
>
>         bpf_map_init_from_attr(&smap->map, attr);
>         smap->map.value_size = value_size;
>         smap->n_buckets = n_buckets;
> -       smap->map.memory.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
> -
> -       err = bpf_map_precharge_memlock(smap->map.memory.pages);
> -       if (err)
> -               goto free_smap;
>
>         err = get_callchain_buffers(sysctl_perf_event_max_stack);
>         if (err)
> -               goto free_smap;
> +               goto free_charge;
>
>         err = prealloc_elems_and_freelist(smap);
>         if (err)
>                 goto put_buffers;
>
> +       bpf_map_charge_move(&smap->map.memory, &mem);
> +
>         return &smap->map;
>
>  put_buffers:
>         put_callchain_buffers();
> -free_smap:
> +free_charge:
> +       bpf_map_charge_finish(&mem);
>         bpf_map_area_free(smap);
>         return ERR_PTR(err);
>  }
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index df14e63806c8..351cc434c4ad 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -188,19 +188,6 @@ void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr)
>         map->numa_node = bpf_map_attr_numa_node(attr);
>  }
>
> -int bpf_map_precharge_memlock(u32 pages)
> -{
> -       struct user_struct *user = get_current_user();
> -       unsigned long memlock_limit, cur;
> -
> -       memlock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> -       cur = atomic_long_read(&user->locked_vm);
> -       free_uid(user);
> -       if (cur + pages > memlock_limit)
> -               return -EPERM;
> -       return 0;
> -}
> -
>  static int bpf_charge_memlock(struct user_struct *user, u32 pages)
>  {
>         unsigned long memlock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> @@ -214,29 +201,40 @@ static int bpf_charge_memlock(struct user_struct *user, u32 pages)
>
>  static void bpf_uncharge_memlock(struct user_struct *user, u32 pages)
>  {
> -       atomic_long_sub(pages, &user->locked_vm);
> +       if (user)
> +               atomic_long_sub(pages, &user->locked_vm);
>  }
>
> -static int bpf_map_init_memlock(struct bpf_map *map)
> +int bpf_map_charge_init(struct bpf_map_memory *mem, u32 pages)
>  {
>         struct user_struct *user = get_current_user();
>         int ret;
>
> -       ret = bpf_charge_memlock(user, map->memory.pages);
> +       ret = bpf_charge_memlock(user, pages);
>         if (ret) {
>                 free_uid(user);
>                 return ret;
>         }
> -       map->memory.user = user;
> -       return ret;
> +
> +       mem->pages = pages;
> +       mem->user = user;
> +
> +       return 0;
>  }
>
> -static void bpf_map_release_memlock(struct bpf_map *map)
> +void bpf_map_charge_finish(struct bpf_map_memory *mem)
>  {
> -       struct user_struct *user = map->memory.user;
> +       bpf_uncharge_memlock(mem->user, mem->pages);
> +       free_uid(mem->user);
> +}
>
> -       bpf_uncharge_memlock(user, map->memory.pages);
> -       free_uid(user);
> +void bpf_map_charge_move(struct bpf_map_memory *dst,
> +                        struct bpf_map_memory *src)
> +{
> +       *dst = *src;
> +
> +       /* Make sure src will not be used for the redundant uncharging. */
> +       memset(src, 0, sizeof(struct bpf_map_memory));
>  }
>
>  int bpf_map_charge_memlock(struct bpf_map *map, u32 pages)
> @@ -304,11 +302,13 @@ void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock)
>  static void bpf_map_free_deferred(struct work_struct *work)
>  {
>         struct bpf_map *map = container_of(work, struct bpf_map, work);
> +       struct bpf_map_memory mem;
>
> -       bpf_map_release_memlock(map);
> +       bpf_map_charge_move(&mem, &map->memory);
>         security_bpf_map_free(map);
>         /* implementation dependent freeing */
>         map->ops->map_free(map);
> +       bpf_map_charge_finish(&mem);
>  }
>
>  static void bpf_map_put_uref(struct bpf_map *map)
> @@ -550,6 +550,7 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
>  static int map_create(union bpf_attr *attr)
>  {
>         int numa_node = bpf_map_attr_numa_node(attr);
> +       struct bpf_map_memory mem;
>         struct bpf_map *map;
>         int f_flags;
>         int err;
> @@ -574,7 +575,7 @@ static int map_create(union bpf_attr *attr)
>
>         err = bpf_obj_name_cpy(map->name, attr->map_name);
>         if (err)
> -               goto free_map_nouncharge;
> +               goto free_map;
>
>         atomic_set(&map->refcnt, 1);
>         atomic_set(&map->usercnt, 1);
> @@ -584,20 +585,20 @@ static int map_create(union bpf_attr *attr)
>
>                 if (!attr->btf_value_type_id) {
>                         err = -EINVAL;
> -                       goto free_map_nouncharge;
> +                       goto free_map;
>                 }
>
>                 btf = btf_get_by_fd(attr->btf_fd);
>                 if (IS_ERR(btf)) {
>                         err = PTR_ERR(btf);
> -                       goto free_map_nouncharge;
> +                       goto free_map;
>                 }
>
>                 err = map_check_btf(map, btf, attr->btf_key_type_id,
>                                     attr->btf_value_type_id);
>                 if (err) {
>                         btf_put(btf);
> -                       goto free_map_nouncharge;
> +                       goto free_map;
>                 }
>
>                 map->btf = btf;
> @@ -609,15 +610,11 @@ static int map_create(union bpf_attr *attr)
>
>         err = security_bpf_map_alloc(map);
>         if (err)
> -               goto free_map_nouncharge;
> -
> -       err = bpf_map_init_memlock(map);
> -       if (err)
> -               goto free_map_sec;
> +               goto free_map;
>
>         err = bpf_map_alloc_id(map);
>         if (err)
> -               goto free_map;
> +               goto free_map_sec;
>
>         err = bpf_map_new_fd(map, f_flags);
>         if (err < 0) {
> @@ -633,13 +630,13 @@ static int map_create(union bpf_attr *attr)
>
>         return err;
>
> -free_map:
> -       bpf_map_release_memlock(map);
>  free_map_sec:
>         security_bpf_map_free(map);
> -free_map_nouncharge:
> +free_map:
>         btf_put(map->btf);
> +       bpf_map_charge_move(&mem, &map->memory);
>         map->ops->map_free(map);
> +       bpf_map_charge_finish(&mem);
>         return err;
>  }
>
> diff --git a/kernel/bpf/xskmap.c b/kernel/bpf/xskmap.c
> index f816ee1a0fa0..a329dab7c7a4 100644
> --- a/kernel/bpf/xskmap.c
> +++ b/kernel/bpf/xskmap.c
> @@ -40,10 +40,9 @@ static struct bpf_map *xsk_map_alloc(union bpf_attr *attr)
>         if (cost >= U32_MAX - PAGE_SIZE)
>                 goto free_m;
>
> -       m->map.memory.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
> -
>         /* Notice returns -EPERM on if map size is larger than memlock limit */
> -       err = bpf_map_precharge_memlock(m->map.memory.pages);
> +       err = bpf_map_charge_init(&m->map.memory,
> +                                 round_up(cost, PAGE_SIZE) >> PAGE_SHIFT);
>         if (err)
>                 goto free_m;
>
> @@ -51,7 +50,7 @@ static struct bpf_map *xsk_map_alloc(union bpf_attr *attr)
>
>         m->flush_list = alloc_percpu(struct list_head);
>         if (!m->flush_list)
> -               goto free_m;
> +               goto free_charge;
>
>         for_each_possible_cpu(cpu)
>                 INIT_LIST_HEAD(per_cpu_ptr(m->flush_list, cpu));
> @@ -65,6 +64,8 @@ static struct bpf_map *xsk_map_alloc(union bpf_attr *attr)
>
>  free_percpu:
>         free_percpu(m->flush_list);
> +free_charge:
> +       bpf_map_charge_finish(&m->map.memory);
>  free_m:
>         kfree(m);
>         return ERR_PTR(err);
> diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
> index 92581c3ff220..621a0b07ff11 100644
> --- a/net/core/bpf_sk_storage.c
> +++ b/net/core/bpf_sk_storage.c
> @@ -640,13 +640,16 @@ static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr)
>         cost = sizeof(*smap->buckets) * nbuckets + sizeof(*smap);
>         pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
>
> -       ret = bpf_map_precharge_memlock(pages);
> -       if (ret < 0)
> +       ret = bpf_map_charge_init(&smap->map.memory, pages);
> +       if (ret < 0) {
> +               kfree(smap);
>                 return ERR_PTR(ret);
> +       }
>
>         smap->buckets = kvcalloc(sizeof(*smap->buckets), nbuckets,
>                                  GFP_USER | __GFP_NOWARN);
>         if (!smap->buckets) {
> +               bpf_map_charge_finish(&smap->map.memory);
>                 kfree(smap);
>                 return ERR_PTR(-ENOMEM);
>         }
> @@ -659,7 +662,6 @@ static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr)
>         smap->elem_size = sizeof(struct bpf_sk_storage_elem) + attr->value_size;
>         smap->cache_idx = (unsigned int)atomic_inc_return(&cache_idx) %
>                 BPF_SK_STORAGE_CACHE_SIZE;
> -       smap->map.memory.pages = pages;
>
>         return &smap->map;
>  }
> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> index 4eb5b6a1b29f..1028c922a149 100644
> --- a/net/core/sock_map.c
> +++ b/net/core/sock_map.c
> @@ -49,8 +49,8 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
>                 goto free_stab;
>         }
>
> -       stab->map.memory.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
> -       err = bpf_map_precharge_memlock(stab->map.memory.pages);
> +       err = bpf_map_charge_init(&stab->map.memory,
> +                                 round_up(cost, PAGE_SIZE) >> PAGE_SHIFT);
>         if (err)
>                 goto free_stab;
>
> @@ -60,6 +60,7 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
>         if (stab->sks)
>                 return &stab->map;
>         err = -ENOMEM;
> +       bpf_map_charge_finish(&stab->map.memory);
>  free_stab:
>         kfree(stab);
>         return ERR_PTR(err);
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next 3/5] bpf: group memory related fields in struct bpf_map_memory
  2019-05-30  1:03 ` [PATCH bpf-next 3/5] bpf: group memory related fields in struct bpf_map_memory Roman Gushchin
@ 2019-05-30 18:53   ` Song Liu
  0 siblings, 0 replies; 13+ messages in thread
From: Song Liu @ 2019-05-30 18:53 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Alexei Starovoitov, Daniel Borkmann, bpf, Kernel Team, open list

On Wed, May 29, 2019 at 6:04 PM Roman Gushchin <guro@fb.com> wrote:
>
> Group "user" and "pages" fields of bpf_map into the bpf_map_memory
> structure. Later it can be extended with "memcg" and other related
> information.
>
> The main reason for a such change (beside cosmetics) is to pass
> bpf_map_memory structure to charging functions before the actual
> allocation of bpf_map.
>
> Signed-off-by: Roman Gushchin <guro@fb.com>

Acked-by: Song Liu <songliubraving@fb.com>

> ---
>  include/linux/bpf.h           | 10 +++++++---
>  kernel/bpf/arraymap.c         |  2 +-
>  kernel/bpf/cpumap.c           |  4 ++--
>  kernel/bpf/devmap.c           |  4 ++--
>  kernel/bpf/hashtab.c          |  4 ++--
>  kernel/bpf/local_storage.c    |  2 +-
>  kernel/bpf/lpm_trie.c         |  4 ++--
>  kernel/bpf/queue_stack_maps.c |  2 +-
>  kernel/bpf/reuseport_array.c  |  2 +-
>  kernel/bpf/stackmap.c         |  4 ++--
>  kernel/bpf/syscall.c          | 19 ++++++++++---------
>  kernel/bpf/xskmap.c           |  4 ++--
>  net/core/bpf_sk_storage.c     |  2 +-
>  net/core/sock_map.c           |  4 ++--
>  14 files changed, 36 insertions(+), 31 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index ff3e00ff84d2..980b7a9bdd21 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -66,6 +66,11 @@ struct bpf_map_ops {
>                                      u64 imm, u32 *off);
>  };
>
> +struct bpf_map_memory {
> +       u32 pages;
> +       struct user_struct *user;
> +};
> +
>  struct bpf_map {
>         /* The first two cachelines with read-mostly members of which some
>          * are also accessed in fast-path (e.g. ops, max_entries).
> @@ -86,7 +91,7 @@ struct bpf_map {
>         u32 btf_key_type_id;
>         u32 btf_value_type_id;
>         struct btf *btf;
> -       u32 pages;
> +       struct bpf_map_memory memory;
>         bool unpriv_array;
>         bool frozen; /* write-once */
>         /* 48 bytes hole */
> @@ -94,8 +99,7 @@ struct bpf_map {
>         /* The 3rd and 4th cacheline with misc members to avoid false sharing
>          * particularly with refcounting.
>          */
> -       struct user_struct *user ____cacheline_aligned;
> -       atomic_t refcnt;
> +       atomic_t refcnt ____cacheline_aligned;
>         atomic_t usercnt;
>         struct work_struct work;
>         char name[BPF_OBJ_NAME_LEN];
> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> index 584636c9e2eb..8fda24e78193 100644
> --- a/kernel/bpf/arraymap.c
> +++ b/kernel/bpf/arraymap.c
> @@ -138,7 +138,7 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
>
>         /* copy mandatory map attributes */
>         bpf_map_init_from_attr(&array->map, attr);
> -       array->map.pages = cost;
> +       array->map.memory.pages = cost;
>         array->elem_size = elem_size;
>
>         if (percpu && bpf_array_alloc_percpu(array)) {
> diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
> index cf727d77c6c6..035268add724 100644
> --- a/kernel/bpf/cpumap.c
> +++ b/kernel/bpf/cpumap.c
> @@ -108,10 +108,10 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
>         cost += cpu_map_bitmap_size(attr) * num_possible_cpus();
>         if (cost >= U32_MAX - PAGE_SIZE)
>                 goto free_cmap;
> -       cmap->map.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
> +       cmap->map.memory.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
>
>         /* Notice returns -EPERM on if map size is larger than memlock limit */
> -       ret = bpf_map_precharge_memlock(cmap->map.pages);
> +       ret = bpf_map_precharge_memlock(cmap->map.memory.pages);
>         if (ret) {
>                 err = ret;
>                 goto free_cmap;
> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> index 1e525d70f833..f6c57efb1d0d 100644
> --- a/kernel/bpf/devmap.c
> +++ b/kernel/bpf/devmap.c
> @@ -111,10 +111,10 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
>         if (cost >= U32_MAX - PAGE_SIZE)
>                 goto free_dtab;
>
> -       dtab->map.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
> +       dtab->map.memory.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
>
>         /* if map size is larger than memlock limit, reject it early */
> -       err = bpf_map_precharge_memlock(dtab->map.pages);
> +       err = bpf_map_precharge_memlock(dtab->map.memory.pages);
>         if (err)
>                 goto free_dtab;
>
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 0f2708fde5f7..15bf228d2e98 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -364,10 +364,10 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
>                 /* make sure page count doesn't overflow */
>                 goto free_htab;
>
> -       htab->map.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
> +       htab->map.memory.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
>
>         /* if map size is larger than memlock limit, reject it early */
> -       err = bpf_map_precharge_memlock(htab->map.pages);
> +       err = bpf_map_precharge_memlock(htab->map.memory.pages);
>         if (err)
>                 goto free_htab;
>
> diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
> index e48302ecb389..574325276650 100644
> --- a/kernel/bpf/local_storage.c
> +++ b/kernel/bpf/local_storage.c
> @@ -303,7 +303,7 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
>         if (!map)
>                 return ERR_PTR(-ENOMEM);
>
> -       map->map.pages = pages;
> +       map->map.memory.pages = pages;
>
>         /* copy mandatory map attributes */
>         bpf_map_init_from_attr(&map->map, attr);
> diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
> index e61630c2e50b..8e423a582760 100644
> --- a/kernel/bpf/lpm_trie.c
> +++ b/kernel/bpf/lpm_trie.c
> @@ -578,9 +578,9 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr)
>                 goto out_err;
>         }
>
> -       trie->map.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
> +       trie->map.memory.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
>
> -       ret = bpf_map_precharge_memlock(trie->map.pages);
> +       ret = bpf_map_precharge_memlock(trie->map.memory.pages);
>         if (ret)
>                 goto out_err;
>
> diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c
> index 0b140d236889..8a510e71d486 100644
> --- a/kernel/bpf/queue_stack_maps.c
> +++ b/kernel/bpf/queue_stack_maps.c
> @@ -89,7 +89,7 @@ static struct bpf_map *queue_stack_map_alloc(union bpf_attr *attr)
>
>         bpf_map_init_from_attr(&qs->map, attr);
>
> -       qs->map.pages = cost;
> +       qs->map.memory.pages = cost;
>         qs->size = size;
>
>         raw_spin_lock_init(&qs->lock);
> diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c
> index 18e225de80ff..819515242739 100644
> --- a/kernel/bpf/reuseport_array.c
> +++ b/kernel/bpf/reuseport_array.c
> @@ -176,7 +176,7 @@ static struct bpf_map *reuseport_array_alloc(union bpf_attr *attr)
>
>         /* copy mandatory map attributes */
>         bpf_map_init_from_attr(&array->map, attr);
> -       array->map.pages = cost;
> +       array->map.memory.pages = cost;
>
>         return &array->map;
>  }
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index 950ab2f28922..08d4efff73ac 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -131,9 +131,9 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr)
>         bpf_map_init_from_attr(&smap->map, attr);
>         smap->map.value_size = value_size;
>         smap->n_buckets = n_buckets;
> -       smap->map.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
> +       smap->map.memory.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
>
> -       err = bpf_map_precharge_memlock(smap->map.pages);
> +       err = bpf_map_precharge_memlock(smap->map.memory.pages);
>         if (err)
>                 goto free_smap;
>
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 3d546b6f4646..df14e63806c8 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -222,19 +222,20 @@ static int bpf_map_init_memlock(struct bpf_map *map)
>         struct user_struct *user = get_current_user();
>         int ret;
>
> -       ret = bpf_charge_memlock(user, map->pages);
> +       ret = bpf_charge_memlock(user, map->memory.pages);
>         if (ret) {
>                 free_uid(user);
>                 return ret;
>         }
> -       map->user = user;
> +       map->memory.user = user;
>         return ret;
>  }
>
>  static void bpf_map_release_memlock(struct bpf_map *map)
>  {
> -       struct user_struct *user = map->user;
> -       bpf_uncharge_memlock(user, map->pages);
> +       struct user_struct *user = map->memory.user;
> +
> +       bpf_uncharge_memlock(user, map->memory.pages);
>         free_uid(user);
>  }
>
> @@ -242,17 +243,17 @@ int bpf_map_charge_memlock(struct bpf_map *map, u32 pages)
>  {
>         int ret;
>
> -       ret = bpf_charge_memlock(map->user, pages);
> +       ret = bpf_charge_memlock(map->memory.user, pages);
>         if (ret)
>                 return ret;
> -       map->pages += pages;
> +       map->memory.pages += pages;
>         return ret;
>  }
>
>  void bpf_map_uncharge_memlock(struct bpf_map *map, u32 pages)
>  {
> -       bpf_uncharge_memlock(map->user, pages);
> -       map->pages -= pages;
> +       bpf_uncharge_memlock(map->memory.user, pages);
> +       map->memory.pages -= pages;
>  }
>
>  static int bpf_map_alloc_id(struct bpf_map *map)
> @@ -395,7 +396,7 @@ static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
>                    map->value_size,
>                    map->max_entries,
>                    map->map_flags,
> -                  map->pages * 1ULL << PAGE_SHIFT,
> +                  map->memory.pages * 1ULL << PAGE_SHIFT,
>                    map->id,
>                    READ_ONCE(map->frozen));
>
> diff --git a/kernel/bpf/xskmap.c b/kernel/bpf/xskmap.c
> index 686d244e798d..f816ee1a0fa0 100644
> --- a/kernel/bpf/xskmap.c
> +++ b/kernel/bpf/xskmap.c
> @@ -40,10 +40,10 @@ static struct bpf_map *xsk_map_alloc(union bpf_attr *attr)
>         if (cost >= U32_MAX - PAGE_SIZE)
>                 goto free_m;
>
> -       m->map.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
> +       m->map.memory.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
>
>         /* Notice returns -EPERM on if map size is larger than memlock limit */
> -       err = bpf_map_precharge_memlock(m->map.pages);
> +       err = bpf_map_precharge_memlock(m->map.memory.pages);
>         if (err)
>                 goto free_m;
>
> diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
> index 9a8aaf8e235d..92581c3ff220 100644
> --- a/net/core/bpf_sk_storage.c
> +++ b/net/core/bpf_sk_storage.c
> @@ -659,7 +659,7 @@ static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr)
>         smap->elem_size = sizeof(struct bpf_sk_storage_elem) + attr->value_size;
>         smap->cache_idx = (unsigned int)atomic_inc_return(&cache_idx) %
>                 BPF_SK_STORAGE_CACHE_SIZE;
> -       smap->map.pages = pages;
> +       smap->map.memory.pages = pages;
>
>         return &smap->map;
>  }
> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> index be6092ac69f8..4eb5b6a1b29f 100644
> --- a/net/core/sock_map.c
> +++ b/net/core/sock_map.c
> @@ -49,8 +49,8 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
>                 goto free_stab;
>         }
>
> -       stab->map.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
> -       err = bpf_map_precharge_memlock(stab->map.pages);
> +       stab->map.memory.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
> +       err = bpf_map_precharge_memlock(stab->map.memory.pages);
>         if (err)
>                 goto free_stab;
>
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next 5/5] bpf: move memory size checks to bpf_map_charge_init()
  2019-05-30  1:03 ` [PATCH bpf-next 5/5] bpf: move memory size checks to bpf_map_charge_init() Roman Gushchin
@ 2019-05-30 18:56   ` Song Liu
  2019-05-30 19:09     ` Roman Gushchin
  0 siblings, 1 reply; 13+ messages in thread
From: Song Liu @ 2019-05-30 18:56 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Alexei Starovoitov, Daniel Borkmann, bpf, Kernel Team, open list

On Wed, May 29, 2019 at 6:05 PM Roman Gushchin <guro@fb.com> wrote:
>
> Most bpf map types doing similar checks and bytes to pages
> conversion during memory allocation and charging.
>
> Let's unify these checks by moving them into bpf_map_charge_init().
>
> Signed-off-by: Roman Gushchin <guro@fb.com>

Nice, I was thinking about similar issues while reading patches
3/5 and 4/5. I really like this simplification.

Acked-by: Song Liu <songliubraving@fb.com>

> ---
>  include/linux/bpf.h           |  2 +-
>  kernel/bpf/arraymap.c         |  8 +-------
>  kernel/bpf/cpumap.c           |  5 +----
>  kernel/bpf/devmap.c           |  5 +----
>  kernel/bpf/hashtab.c          |  7 +------
>  kernel/bpf/local_storage.c    |  5 +----
>  kernel/bpf/lpm_trie.c         |  7 +------
>  kernel/bpf/queue_stack_maps.c |  4 ----
>  kernel/bpf/reuseport_array.c  | 10 ++--------
>  kernel/bpf/stackmap.c         |  8 +-------
>  kernel/bpf/syscall.c          |  9 +++++++--
>  kernel/bpf/xskmap.c           |  5 +----
>  net/core/bpf_sk_storage.c     |  4 +---
>  net/core/sock_map.c           |  8 +-------
>  14 files changed, 20 insertions(+), 67 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 6187203b0414..3997c0038062 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -602,7 +602,7 @@ void bpf_map_put_with_uref(struct bpf_map *map);
>  void bpf_map_put(struct bpf_map *map);
>  int bpf_map_charge_memlock(struct bpf_map *map, u32 pages);
>  void bpf_map_uncharge_memlock(struct bpf_map *map, u32 pages);
> -int bpf_map_charge_init(struct bpf_map_memory *mem, u32 pages);
> +int bpf_map_charge_init(struct bpf_map_memory *mem, size_t size);
>  void bpf_map_charge_finish(struct bpf_map_memory *mem);
>  void bpf_map_charge_move(struct bpf_map_memory *dst,
>                          struct bpf_map_memory *src);
> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> index 3552da4407d9..0349cbf23cdb 100644
> --- a/kernel/bpf/arraymap.c
> +++ b/kernel/bpf/arraymap.c
> @@ -117,14 +117,8 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
>
>         /* make sure there is no u32 overflow later in round_up() */
>         cost = array_size;
> -       if (cost >= U32_MAX - PAGE_SIZE)
> -               return ERR_PTR(-ENOMEM);
> -       if (percpu) {
> +       if (percpu)
>                 cost += (u64)attr->max_entries * elem_size * num_possible_cpus();
> -               if (cost >= U32_MAX - PAGE_SIZE)
> -                       return ERR_PTR(-ENOMEM);
> -       }
> -       cost = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
>
>         ret = bpf_map_charge_init(&mem, cost);
>         if (ret < 0)
> diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
> index c633c8d68023..b31a71909307 100644
> --- a/kernel/bpf/cpumap.c
> +++ b/kernel/bpf/cpumap.c
> @@ -106,12 +106,9 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
>         /* make sure page count doesn't overflow */
>         cost = (u64) cmap->map.max_entries * sizeof(struct bpf_cpu_map_entry *);
>         cost += cpu_map_bitmap_size(attr) * num_possible_cpus();
> -       if (cost >= U32_MAX - PAGE_SIZE)
> -               goto free_cmap;
>
>         /* Notice returns -EPERM on if map size is larger than memlock limit */
> -       ret = bpf_map_charge_init(&cmap->map.memory,
> -                                 round_up(cost, PAGE_SIZE) >> PAGE_SHIFT);
> +       ret = bpf_map_charge_init(&cmap->map.memory, cost);
>         if (ret) {
>                 err = ret;
>                 goto free_cmap;
> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> index 371bd880ed58..5ae7cce5ef16 100644
> --- a/kernel/bpf/devmap.c
> +++ b/kernel/bpf/devmap.c
> @@ -108,12 +108,9 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
>         /* make sure page count doesn't overflow */
>         cost = (u64) dtab->map.max_entries * sizeof(struct bpf_dtab_netdev *);
>         cost += dev_map_bitmap_size(attr) * num_possible_cpus();
> -       if (cost >= U32_MAX - PAGE_SIZE)
> -               goto free_dtab;
>
>         /* if map size is larger than memlock limit, reject it */
> -       err = bpf_map_charge_init(&dtab->map.memory,
> -                                 round_up(cost, PAGE_SIZE) >> PAGE_SHIFT);
> +       err = bpf_map_charge_init(&dtab->map.memory, cost);
>         if (err)
>                 goto free_dtab;
>
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index b0bdc7b040ad..d92e05d9979b 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -360,13 +360,8 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
>         else
>                cost += (u64) htab->elem_size * num_possible_cpus();
>
> -       if (cost >= U32_MAX - PAGE_SIZE)
> -               /* make sure page count doesn't overflow */
> -               goto free_htab;
> -
>         /* if map size is larger than memlock limit, reject it */
> -       err = bpf_map_charge_init(&htab->map.memory,
> -                                 round_up(cost, PAGE_SIZE) >> PAGE_SHIFT);
> +       err = bpf_map_charge_init(&htab->map.memory, cost);
>         if (err)
>                 goto free_htab;
>
> diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
> index e49bfd4f4f6d..addd6fdceec8 100644
> --- a/kernel/bpf/local_storage.c
> +++ b/kernel/bpf/local_storage.c
> @@ -273,7 +273,6 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
>         int numa_node = bpf_map_attr_numa_node(attr);
>         struct bpf_cgroup_storage_map *map;
>         struct bpf_map_memory mem;
> -       u32 pages;
>         int ret;
>
>         if (attr->key_size != sizeof(struct bpf_cgroup_storage_key))
> @@ -293,9 +292,7 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
>                 /* max_entries is not used and enforced to be 0 */
>                 return ERR_PTR(-EINVAL);
>
> -       pages = round_up(sizeof(struct bpf_cgroup_storage_map), PAGE_SIZE) >>
> -               PAGE_SHIFT;
> -       ret = bpf_map_charge_init(&mem, pages);
> +       ret = bpf_map_charge_init(&mem, sizeof(struct bpf_cgroup_storage_map));
>         if (ret < 0)
>                 return ERR_PTR(ret);
>
> diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
> index 6345a8d2dcd0..09334f13a8a0 100644
> --- a/kernel/bpf/lpm_trie.c
> +++ b/kernel/bpf/lpm_trie.c
> @@ -573,13 +573,8 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr)
>         cost_per_node = sizeof(struct lpm_trie_node) +
>                         attr->value_size + trie->data_size;
>         cost += (u64) attr->max_entries * cost_per_node;
> -       if (cost >= U32_MAX - PAGE_SIZE) {
> -               ret = -E2BIG;
> -               goto out_err;
> -       }
>
> -       ret = bpf_map_charge_init(&trie->map.memory,
> -                                 round_up(cost, PAGE_SIZE) >> PAGE_SHIFT);
> +       ret = bpf_map_charge_init(&trie->map.memory, cost);
>         if (ret)
>                 goto out_err;
>
> diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c
> index 224cb0fd8f03..f697647ceb54 100644
> --- a/kernel/bpf/queue_stack_maps.c
> +++ b/kernel/bpf/queue_stack_maps.c
> @@ -73,10 +73,6 @@ static struct bpf_map *queue_stack_map_alloc(union bpf_attr *attr)
>
>         size = (u64) attr->max_entries + 1;
>         cost = queue_size = sizeof(*qs) + size * attr->value_size;
> -       if (cost >= U32_MAX - PAGE_SIZE)
> -               return ERR_PTR(-E2BIG);
> -
> -       cost = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
>
>         ret = bpf_map_charge_init(&mem, cost);
>         if (ret < 0)
> diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c
> index 5c6e25b1b9b1..50c083ba978c 100644
> --- a/kernel/bpf/reuseport_array.c
> +++ b/kernel/bpf/reuseport_array.c
> @@ -152,7 +152,7 @@ static struct bpf_map *reuseport_array_alloc(union bpf_attr *attr)
>         int err, numa_node = bpf_map_attr_numa_node(attr);
>         struct reuseport_array *array;
>         struct bpf_map_memory mem;
> -       u64 cost, array_size;
> +       u64 array_size;
>
>         if (!capable(CAP_SYS_ADMIN))
>                 return ERR_PTR(-EPERM);
> @@ -160,13 +160,7 @@ static struct bpf_map *reuseport_array_alloc(union bpf_attr *attr)
>         array_size = sizeof(*array);
>         array_size += (u64)attr->max_entries * sizeof(struct sock *);
>
> -       /* make sure there is no u32 overflow later in round_up() */
> -       cost = array_size;
> -       if (cost >= U32_MAX - PAGE_SIZE)
> -               return ERR_PTR(-ENOMEM);
> -       cost = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
> -
> -       err = bpf_map_charge_init(&mem, cost);
> +       err = bpf_map_charge_init(&mem, array_size);
>         if (err)
>                 return ERR_PTR(err);
>
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index 8da24ca65d97..3d86072d8e32 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -117,14 +117,8 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr)
>         n_buckets = roundup_pow_of_two(attr->max_entries);
>
>         cost = n_buckets * sizeof(struct stack_map_bucket *) + sizeof(*smap);
> -       if (cost >= U32_MAX - PAGE_SIZE)
> -               return ERR_PTR(-E2BIG);
>         cost += n_buckets * (value_size + sizeof(struct stack_map_bucket));
> -       if (cost >= U32_MAX - PAGE_SIZE)
> -               return ERR_PTR(-E2BIG);
> -
> -       err = bpf_map_charge_init(&mem,
> -                                 round_up(cost, PAGE_SIZE) >> PAGE_SHIFT);
> +       err = bpf_map_charge_init(&mem, cost);
>         if (err)
>                 return ERR_PTR(err);
>
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 351cc434c4ad..b3e83712b982 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -205,11 +205,16 @@ static void bpf_uncharge_memlock(struct user_struct *user, u32 pages)
>                 atomic_long_sub(pages, &user->locked_vm);
>  }
>
> -int bpf_map_charge_init(struct bpf_map_memory *mem, u32 pages)
> +int bpf_map_charge_init(struct bpf_map_memory *mem, size_t size)
>  {
> -       struct user_struct *user = get_current_user();
> +       u32 pages = round_up(size, PAGE_SIZE) >> PAGE_SHIFT;
> +       struct user_struct *user;
>         int ret;
>
> +       if (size >= U32_MAX - PAGE_SIZE)
> +               return -E2BIG;
> +
> +       user = get_current_user();
>         ret = bpf_charge_memlock(user, pages);
>         if (ret) {
>                 free_uid(user);
> diff --git a/kernel/bpf/xskmap.c b/kernel/bpf/xskmap.c
> index a329dab7c7a4..22066c28ba61 100644
> --- a/kernel/bpf/xskmap.c
> +++ b/kernel/bpf/xskmap.c
> @@ -37,12 +37,9 @@ static struct bpf_map *xsk_map_alloc(union bpf_attr *attr)
>
>         cost = (u64)m->map.max_entries * sizeof(struct xdp_sock *);
>         cost += sizeof(struct list_head) * num_possible_cpus();
> -       if (cost >= U32_MAX - PAGE_SIZE)
> -               goto free_m;
>
>         /* Notice returns -EPERM on if map size is larger than memlock limit */
> -       err = bpf_map_charge_init(&m->map.memory,
> -                                 round_up(cost, PAGE_SIZE) >> PAGE_SHIFT);
> +       err = bpf_map_charge_init(&m->map.memory, cost);
>         if (err)
>                 goto free_m;
>
> diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
> index 621a0b07ff11..f40e3d35fd9c 100644
> --- a/net/core/bpf_sk_storage.c
> +++ b/net/core/bpf_sk_storage.c
> @@ -626,7 +626,6 @@ static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr)
>         struct bpf_sk_storage_map *smap;
>         unsigned int i;
>         u32 nbuckets;
> -       u32 pages;
>         u64 cost;
>         int ret;
>
> @@ -638,9 +637,8 @@ static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr)
>         smap->bucket_log = ilog2(roundup_pow_of_two(num_possible_cpus()));
>         nbuckets = 1U << smap->bucket_log;
>         cost = sizeof(*smap->buckets) * nbuckets + sizeof(*smap);
> -       pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
>
> -       ret = bpf_map_charge_init(&smap->map.memory, pages);
> +       ret = bpf_map_charge_init(&smap->map.memory, cost);
>         if (ret < 0) {
>                 kfree(smap);
>                 return ERR_PTR(ret);
> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> index 1028c922a149..52d4faeee18b 100644
> --- a/net/core/sock_map.c
> +++ b/net/core/sock_map.c
> @@ -44,13 +44,7 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
>
>         /* Make sure page count doesn't overflow. */
>         cost = (u64) stab->map.max_entries * sizeof(struct sock *);
> -       if (cost >= U32_MAX - PAGE_SIZE) {
> -               err = -EINVAL;
> -               goto free_stab;
> -       }
> -
> -       err = bpf_map_charge_init(&stab->map.memory,
> -                                 round_up(cost, PAGE_SIZE) >> PAGE_SHIFT);
> +       err = bpf_map_charge_init(&stab->map.memory, cost);
>         if (err)
>                 goto free_stab;
>
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next 5/5] bpf: move memory size checks to bpf_map_charge_init()
  2019-05-30 18:56   ` Song Liu
@ 2019-05-30 19:09     ` Roman Gushchin
  0 siblings, 0 replies; 13+ messages in thread
From: Roman Gushchin @ 2019-05-30 19:09 UTC (permalink / raw)
  To: Song Liu; +Cc: Alexei Starovoitov, Daniel Borkmann, bpf, Kernel Team, open list

On Thu, May 30, 2019 at 11:56:55AM -0700, Song Liu wrote:
> On Wed, May 29, 2019 at 6:05 PM Roman Gushchin <guro@fb.com> wrote:
> >
> > Most bpf map types doing similar checks and bytes to pages
> > conversion during memory allocation and charging.
> >
> > Let's unify these checks by moving them into bpf_map_charge_init().
> >
> > Signed-off-by: Roman Gushchin <guro@fb.com>
> 
> Nice, I was thinking about similar issues while reading patches
> 3/5 and 4/5. I really like this simplification.
> 
> Acked-by: Song Liu <songliubraving@fb.com>

Thank you for the review, Song!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next 0/5] bpf: bpf maps memory accounting cleanup
  2019-05-30  1:03 [PATCH bpf-next 0/5] bpf: bpf maps memory accounting cleanup Roman Gushchin
                   ` (4 preceding siblings ...)
  2019-05-30  1:03 ` [PATCH bpf-next 5/5] bpf: move memory size checks to bpf_map_charge_init() Roman Gushchin
@ 2019-06-01  0:00 ` Alexei Starovoitov
  5 siblings, 0 replies; 13+ messages in thread
From: Alexei Starovoitov @ 2019-06-01  0:00 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Alexei Starovoitov, Daniel Borkmann, bpf, Kernel Team, LKML

On Wed, May 29, 2019 at 6:04 PM Roman Gushchin <guro@fb.com> wrote:
>
> During my work on memcg-based memory accounting for bpf maps
> I've done some cleanups and refactorings of the existing
> memlock rlimit-based code. It makes it more robust, unifies
> size to pages conversion, size checks and corresponding error
> codes. Also it adds coverage for cgroup local storage and
> socket local storage maps.
>
> It looks like some preliminary work on the mm side might be
> required to start working on the memcg-based accounting,
> so I'm sending these patches as a separate patchset.

Applied. Thanks

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-06-01  0:01 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-30  1:03 [PATCH bpf-next 0/5] bpf: bpf maps memory accounting cleanup Roman Gushchin
2019-05-30  1:03 ` [PATCH bpf-next 1/5] bpf: add memlock precharge check for cgroup_local_storage Roman Gushchin
2019-05-30 18:26   ` Song Liu
2019-05-30  1:03 ` [PATCH bpf-next 2/5] bpf: add memlock precharge for socket local storage Roman Gushchin
2019-05-30 18:26   ` Song Liu
2019-05-30  1:03 ` [PATCH bpf-next 3/5] bpf: group memory related fields in struct bpf_map_memory Roman Gushchin
2019-05-30 18:53   ` Song Liu
2019-05-30  1:03 ` [PATCH bpf-next 4/5] bpf: rework memlock-based memory accounting for maps Roman Gushchin
2019-05-30 18:52   ` Song Liu
2019-05-30  1:03 ` [PATCH bpf-next 5/5] bpf: move memory size checks to bpf_map_charge_init() Roman Gushchin
2019-05-30 18:56   ` Song Liu
2019-05-30 19:09     ` Roman Gushchin
2019-06-01  0:00 ` [PATCH bpf-next 0/5] bpf: bpf maps memory accounting cleanup Alexei Starovoitov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.