netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v5 00/34] bpf: switch to memcg-based memory accounting
@ 2020-11-12 22:15 Roman Gushchin
  2020-11-12 22:15 ` [PATCH bpf-next v5 05/34] bpf: memcg-based memory accounting for bpf progs Roman Gushchin
                   ` (6 more replies)
  0 siblings, 7 replies; 27+ messages in thread
From: Roman Gushchin @ 2020-11-12 22:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, netdev, Andrii Nakryiko,
	Shakeel Butt, linux-mm, linux-kernel, kernel-team,
	Roman Gushchin

Currently bpf is using the memlock rlimit for the memory accounting.
This approach has its downsides and over time has created a significant
amount of problems:

1) The limit is per-user, but because most bpf operations are performed
   as root, the limit has a little value.

2) It's hard to come up with a specific maximum value. Especially because
   the counter is shared with non-bpf users (e.g. memlock() users).
   Any specific value is either too low and creates false failures
   or too high and useless.

3) Charging is not connected to the actual memory allocation. Bpf code
   should manually calculate the estimated cost and precharge the counter,
   and then take care of uncharging, including all fail paths.
   It adds to the code complexity and makes it easy to leak a charge.

4) There is no simple way of getting the current value of the counter.
   We've used drgn for it, but it's far from being convenient.

5) Cryptic -EPERM is returned on exceeding the limit. Libbpf even had
   a function to "explain" this case for users.

In order to overcome these problems let's switch to the memcg-based
memory accounting of bpf objects. With the recent addition of the percpu
memory accounting, now it's possible to provide a comprehensive accounting
of the memory used by bpf programs and maps.

This approach has the following advantages:
1) The limit is per-cgroup and hierarchical. It's way more flexible and allows
   a better control over memory usage by different workloads. Of course, it
   requires enabled cgroups and kernel memory accounting and properly configured
   cgroup tree, but it's a default configuration for a modern Linux system.

2) The actual memory consumption is taken into account. It happens automatically
   on the allocation time if __GFP_ACCOUNT flags is passed. Uncharging is also
   performed automatically on releasing the memory. So the code on the bpf side
   becomes simpler and safer.

3) There is a simple way to get the current value and statistics.

In general, if a process performs a bpf operation (e.g. creates or updates
a map), it's memory cgroup is charged. However map updates performed from
an interrupt context are charged to the memory cgroup which contained
the process, which created the map.

Providing a 1:1 replacement for the rlimit-based memory accounting is
a non-goal of this patchset. Users and memory cgroups are completely
orthogonal, so it's not possible even in theory.
Memcg-based memory accounting requires a properly configured cgroup tree
to be actually useful. However, it's the way how the memory is managed
on a modern Linux system.


The patchset consists of the following parts:
1) 4 mm patches, which are already in the mm tree, but are required
   to avoid a regression (otherwise vmallocs cannot be mapped to userspace).
2) memcg-based accounting for various bpf objects: progs and maps
3) removal of the rlimit-based accounting
4) removal of rlimit adjustments in userspace samples

First 4 patches are not supposed to be merged via the bpf tree. I'm including
them to make sure bpf tests will pass.

v5:
  - rebased to the latest version of the remote charging API
  - implemented kmem accounting from an interrupt context, by Shakeel
  - rebased to latest changes in mm allowed to map vmallocs to userspace
  - fixed a build issue in kselftests, by Alexei
  - fixed a use-after-free bug in bpf_map_free_deferred()
  - added bpf line info coverage, by Shakeel
  - split bpf map charging preparations into a separate patch

v4:
  - covered allocations made from an interrupt context, by Daniel
  - added some clarifications to the cover letter

v3:
  - droped the userspace part for further discussions/refinements,
    by Andrii and Song

v2:
  - fixed build issue, caused by the remaining rlimit-based accounting
    for sockhash maps


Roman Gushchin (34):
  mm: memcontrol: use helpers to read page's memcg data
  mm: memcontrol/slab: use helpers to access slab page's memcg_data
  mm: introduce page memcg flags
  mm: convert page kmemcg type to a page memcg flag
  bpf: memcg-based memory accounting for bpf progs
  bpf: prepare for memcg-based memory accounting for bpf maps
  bpf: memcg-based memory accounting for bpf maps
  bpf: refine memcg-based memory accounting for arraymap maps
  bpf: refine memcg-based memory accounting for cpumap maps
  bpf: memcg-based memory accounting for cgroup storage maps
  bpf: refine memcg-based memory accounting for devmap maps
  bpf: refine memcg-based memory accounting for hashtab maps
  bpf: memcg-based memory accounting for lpm_trie maps
  bpf: memcg-based memory accounting for bpf ringbuffer
  bpf: memcg-based memory accounting for bpf local storage maps
  bpf: refine memcg-based memory accounting for sockmap and sockhash
    maps
  bpf: refine memcg-based memory accounting for xskmap maps
  bpf: eliminate rlimit-based memory accounting for arraymap maps
  bpf: eliminate rlimit-based memory accounting for bpf_struct_ops maps
  bpf: eliminate rlimit-based memory accounting for cpumap maps
  bpf: eliminate rlimit-based memory accounting for cgroup storage maps
  bpf: eliminate rlimit-based memory accounting for devmap maps
  bpf: eliminate rlimit-based memory accounting for hashtab maps
  bpf: eliminate rlimit-based memory accounting for lpm_trie maps
  bpf: eliminate rlimit-based memory accounting for queue_stack_maps
    maps
  bpf: eliminate rlimit-based memory accounting for reuseport_array maps
  bpf: eliminate rlimit-based memory accounting for bpf ringbuffer
  bpf: eliminate rlimit-based memory accounting for sockmap and sockhash
    maps
  bpf: eliminate rlimit-based memory accounting for stackmap maps
  bpf: eliminate rlimit-based memory accounting for xskmap maps
  bpf: eliminate rlimit-based memory accounting for bpf local storage
    maps
  bpf: eliminate rlimit-based memory accounting infra for bpf maps
  bpf: eliminate rlimit-based memory accounting for bpf progs
  bpf: samples: do not touch RLIMIT_MEMLOCK

 fs/buffer.c                                   |   2 +-
 fs/iomap/buffered-io.c                        |   2 +-
 include/linux/bpf.h                           |  27 +--
 include/linux/memcontrol.h                    | 215 +++++++++++++++++-
 include/linux/mm.h                            |  22 --
 include/linux/mm_types.h                      |   5 +-
 include/linux/page-flags.h                    |  11 +-
 include/trace/events/writeback.h              |   2 +-
 kernel/bpf/arraymap.c                         |  30 +--
 kernel/bpf/bpf_local_storage.c                |  18 +-
 kernel/bpf/bpf_struct_ops.c                   |  19 +-
 kernel/bpf/core.c                             |  22 +-
 kernel/bpf/cpumap.c                           |  20 +-
 kernel/bpf/devmap.c                           |  23 +-
 kernel/bpf/hashtab.c                          |  33 +--
 kernel/bpf/helpers.c                          |  37 ++-
 kernel/bpf/local_storage.c                    |  38 +---
 kernel/bpf/lpm_trie.c                         |  17 +-
 kernel/bpf/queue_stack_maps.c                 |  16 +-
 kernel/bpf/reuseport_array.c                  |  12 +-
 kernel/bpf/ringbuf.c                          |  33 +--
 kernel/bpf/stackmap.c                         |  16 +-
 kernel/bpf/syscall.c                          | 177 ++++----------
 kernel/fork.c                                 |   7 +-
 mm/debug.c                                    |   4 +-
 mm/huge_memory.c                              |   4 +-
 mm/memcontrol.c                               | 139 +++++------
 mm/page_alloc.c                               |   8 +-
 mm/page_io.c                                  |   6 +-
 mm/slab.h                                     |  38 +---
 mm/workingset.c                               |   2 +-
 net/core/bpf_sk_storage.c                     |   2 +-
 net/core/sock_map.c                           |  40 +---
 net/xdp/xskmap.c                              |  15 +-
 samples/bpf/map_perf_test_user.c              |   6 -
 samples/bpf/offwaketime_user.c                |   6 -
 samples/bpf/sockex2_user.c                    |   2 -
 samples/bpf/sockex3_user.c                    |   2 -
 samples/bpf/spintest_user.c                   |   6 -
 samples/bpf/syscall_tp_user.c                 |   2 -
 samples/bpf/task_fd_query_user.c              |   5 -
 samples/bpf/test_lru_dist.c                   |   3 -
 samples/bpf/test_map_in_map_user.c            |   6 -
 samples/bpf/test_overhead_user.c              |   2 -
 samples/bpf/trace_event_user.c                |   2 -
 samples/bpf/tracex2_user.c                    |   6 -
 samples/bpf/tracex3_user.c                    |   6 -
 samples/bpf/tracex4_user.c                    |   6 -
 samples/bpf/tracex5_user.c                    |   3 -
 samples/bpf/tracex6_user.c                    |   3 -
 samples/bpf/xdp1_user.c                       |   6 -
 samples/bpf/xdp_adjust_tail_user.c            |   6 -
 samples/bpf/xdp_monitor_user.c                |   5 -
 samples/bpf/xdp_redirect_cpu_user.c           |   6 -
 samples/bpf/xdp_redirect_map_user.c           |   6 -
 samples/bpf/xdp_redirect_user.c               |   6 -
 samples/bpf/xdp_router_ipv4_user.c            |   6 -
 samples/bpf/xdp_rxq_info_user.c               |   6 -
 samples/bpf/xdp_sample_pkts_user.c            |   6 -
 samples/bpf/xdp_tx_iptunnel_user.c            |   6 -
 samples/bpf/xdpsock_user.c                    |   7 -
 .../selftests/bpf/progs/bpf_iter_bpf_map.c    |   2 +-
 .../selftests/bpf/progs/map_ptr_kern.c        |   7 -
 63 files changed, 460 insertions(+), 743 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v5 05/34] bpf: memcg-based memory accounting for bpf progs
  2020-11-12 22:15 [PATCH bpf-next v5 00/34] bpf: switch to memcg-based memory accounting Roman Gushchin
@ 2020-11-12 22:15 ` Roman Gushchin
  2020-11-13 17:31   ` Song Liu
  2020-11-12 22:15 ` [PATCH bpf-next v5 06/34] bpf: prepare for memcg-based memory accounting for bpf maps Roman Gushchin
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Roman Gushchin @ 2020-11-12 22:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, netdev, Andrii Nakryiko,
	Shakeel Butt, linux-mm, linux-kernel, kernel-team,
	Roman Gushchin

Include memory used by bpf programs into the memcg-based accounting.
This includes the memory used by programs itself, auxiliary data,
statistics and bpf line info. A memory cgroup containing the
process which loads the program is getting charged.

Signed-off-by: Roman Gushchin <guro@fb.com>
---
 kernel/bpf/core.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 9268d77898b7..8346ebcbde99 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -77,7 +77,7 @@ void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, uns
 
 struct bpf_prog *bpf_prog_alloc_no_stats(unsigned int size, gfp_t gfp_extra_flags)
 {
-	gfp_t gfp_flags = GFP_KERNEL | __GFP_ZERO | gfp_extra_flags;
+	gfp_t gfp_flags = GFP_KERNEL_ACCOUNT | __GFP_ZERO | gfp_extra_flags;
 	struct bpf_prog_aux *aux;
 	struct bpf_prog *fp;
 
@@ -86,7 +86,7 @@ struct bpf_prog *bpf_prog_alloc_no_stats(unsigned int size, gfp_t gfp_extra_flag
 	if (fp == NULL)
 		return NULL;
 
-	aux = kzalloc(sizeof(*aux), GFP_KERNEL | gfp_extra_flags);
+	aux = kzalloc(sizeof(*aux), GFP_KERNEL_ACCOUNT | gfp_extra_flags);
 	if (aux == NULL) {
 		vfree(fp);
 		return NULL;
@@ -106,7 +106,7 @@ struct bpf_prog *bpf_prog_alloc_no_stats(unsigned int size, gfp_t gfp_extra_flag
 
 struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flags)
 {
-	gfp_t gfp_flags = GFP_KERNEL | __GFP_ZERO | gfp_extra_flags;
+	gfp_t gfp_flags = GFP_KERNEL_ACCOUNT | __GFP_ZERO | gfp_extra_flags;
 	struct bpf_prog *prog;
 	int cpu;
 
@@ -138,7 +138,7 @@ int bpf_prog_alloc_jited_linfo(struct bpf_prog *prog)
 
 	prog->aux->jited_linfo = kcalloc(prog->aux->nr_linfo,
 					 sizeof(*prog->aux->jited_linfo),
-					 GFP_KERNEL | __GFP_NOWARN);
+					 GFP_KERNEL_ACCOUNT | __GFP_NOWARN);
 	if (!prog->aux->jited_linfo)
 		return -ENOMEM;
 
@@ -219,7 +219,7 @@ void bpf_prog_free_linfo(struct bpf_prog *prog)
 struct bpf_prog *bpf_prog_realloc(struct bpf_prog *fp_old, unsigned int size,
 				  gfp_t gfp_extra_flags)
 {
-	gfp_t gfp_flags = GFP_KERNEL | __GFP_ZERO | gfp_extra_flags;
+	gfp_t gfp_flags = GFP_KERNEL_ACCOUNT | __GFP_ZERO | gfp_extra_flags;
 	struct bpf_prog *fp;
 	u32 pages, delta;
 	int ret;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v5 06/34] bpf: prepare for memcg-based memory accounting for bpf maps
  2020-11-12 22:15 [PATCH bpf-next v5 00/34] bpf: switch to memcg-based memory accounting Roman Gushchin
  2020-11-12 22:15 ` [PATCH bpf-next v5 05/34] bpf: memcg-based memory accounting for bpf progs Roman Gushchin
@ 2020-11-12 22:15 ` Roman Gushchin
  2020-11-13 17:46   ` Song Liu
  2020-11-12 22:15 ` [PATCH bpf-next v5 07/34] bpf: " Roman Gushchin
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Roman Gushchin @ 2020-11-12 22:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, netdev, Andrii Nakryiko,
	Shakeel Butt, linux-mm, linux-kernel, kernel-team,
	Roman Gushchin

In the absolute majority of cases if a process is making a kernel
allocation, it's memory cgroup is getting charged.

Bpf maps can be updated from an interrupt context and in such
case there is no process which can be charged. It makes the memory
accounting of bpf maps non-trivial.

Fortunately, after commits 4127c6504f25 ("mm: kmem: enable kernel
memcg accounting from interrupt contexts") and b87d8cefe43c
("mm, memcg: rework remote charging API to support nesting")
it's finally possible.

To do it, a pointer to the memory cgroup of the process which created
the map is saved, and this cgroup is getting charged for all
allocations made from an interrupt context.

Allocations made from a process context will be accounted in a usual way.

Signed-off-by: Roman Gushchin <guro@fb.com>
---
 include/linux/bpf.h  |  4 ++++
 kernel/bpf/helpers.c | 37 ++++++++++++++++++++++++++++++++++++-
 kernel/bpf/syscall.c | 25 +++++++++++++++++++++++++
 3 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 581b2a2e78eb..1d6e7b125877 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -37,6 +37,7 @@ struct bpf_iter_aux_info;
 struct bpf_local_storage;
 struct bpf_local_storage_map;
 struct kobject;
+struct mem_cgroup;
 
 extern struct idr btf_idr;
 extern spinlock_t btf_idr_lock;
@@ -161,6 +162,9 @@ struct bpf_map {
 	u32 btf_value_type_id;
 	struct btf *btf;
 	struct bpf_map_memory memory;
+#ifdef CONFIG_MEMCG_KMEM
+	struct mem_cgroup *memcg;
+#endif
 	char name[BPF_OBJ_NAME_LEN];
 	u32 btf_vmlinux_value_type_id;
 	bool bypass_spec_v1;
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 25520f5eeaf6..b6327cbe7e41 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -14,6 +14,7 @@
 #include <linux/jiffies.h>
 #include <linux/pid_namespace.h>
 #include <linux/proc_ns.h>
+#include <linux/sched/mm.h>
 
 #include "../../lib/kstrtox.h"
 
@@ -41,11 +42,45 @@ const struct bpf_func_proto bpf_map_lookup_elem_proto = {
 	.arg2_type	= ARG_PTR_TO_MAP_KEY,
 };
 
+#ifdef CONFIG_MEMCG_KMEM
+static __always_inline int __bpf_map_update_elem(struct bpf_map *map, void *key,
+						 void *value, u64 flags)
+{
+	struct mem_cgroup *old_memcg;
+	bool in_interrupt;
+	int ret;
+
+	/*
+	 * If update from an interrupt context results in a memory allocation,
+	 * the memory cgroup to charge can't be determined from the context
+	 * of the current task. Instead, we charge the memory cgroup, which
+	 * contained a process created the map.
+	 */
+	in_interrupt = in_interrupt();
+	if (in_interrupt)
+		old_memcg = set_active_memcg(map->memcg);
+
+	ret = map->ops->map_update_elem(map, key, value, flags);
+
+	if (in_interrupt)
+		set_active_memcg(old_memcg);
+
+	return ret;
+}
+#else
+static __always_inline int __bpf_map_update_elem(struct bpf_map *map, void *key,
+						 void *value, u64 flags)
+{
+	return map->ops->map_update_elem(map, key, value, flags);
+}
+#endif
+
 BPF_CALL_4(bpf_map_update_elem, struct bpf_map *, map, void *, key,
 	   void *, value, u64, flags)
 {
 	WARN_ON_ONCE(!rcu_read_lock_held());
-	return map->ops->map_update_elem(map, key, value, flags);
+
+	return __bpf_map_update_elem(map, key, value, flags);
 }
 
 const struct bpf_func_proto bpf_map_update_elem_proto = {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index f3fe9f53f93c..2d77fc2496da 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -31,6 +31,7 @@
 #include <linux/poll.h>
 #include <linux/bpf-netns.h>
 #include <linux/rcupdate_trace.h>
+#include <linux/memcontrol.h>
 
 #define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \
 			  (map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \
@@ -456,6 +457,27 @@ void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock)
 		__release(&map_idr_lock);
 }
 
+#ifdef CONFIG_MEMCG_KMEM
+static void bpf_map_save_memcg(struct bpf_map *map)
+{
+	map->memcg = get_mem_cgroup_from_mm(current->mm);
+}
+
+static void bpf_map_release_memcg(struct bpf_map *map)
+{
+	mem_cgroup_put(map->memcg);
+}
+
+#else
+static void bpf_map_save_memcg(struct bpf_map *map)
+{
+}
+
+static void bpf_map_release_memcg(struct bpf_map *map)
+{
+}
+#endif
+
 /* called from workqueue */
 static void bpf_map_free_deferred(struct work_struct *work)
 {
@@ -464,6 +486,7 @@ static void bpf_map_free_deferred(struct work_struct *work)
 
 	bpf_map_charge_move(&mem, &map->memory);
 	security_bpf_map_free(map);
+	bpf_map_release_memcg(map);
 	/* implementation dependent freeing */
 	map->ops->map_free(map);
 	bpf_map_charge_finish(&mem);
@@ -875,6 +898,8 @@ static int map_create(union bpf_attr *attr)
 	if (err)
 		goto free_map_sec;
 
+	bpf_map_save_memcg(map);
+
 	err = bpf_map_new_fd(map, f_flags);
 	if (err < 0) {
 		/* failed to allocate fd.
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v5 07/34] bpf: memcg-based memory accounting for bpf maps
  2020-11-12 22:15 [PATCH bpf-next v5 00/34] bpf: switch to memcg-based memory accounting Roman Gushchin
  2020-11-12 22:15 ` [PATCH bpf-next v5 05/34] bpf: memcg-based memory accounting for bpf progs Roman Gushchin
  2020-11-12 22:15 ` [PATCH bpf-next v5 06/34] bpf: prepare for memcg-based memory accounting for bpf maps Roman Gushchin
@ 2020-11-12 22:15 ` Roman Gushchin
  2020-11-13 18:04   ` Song Liu
  2020-11-12 22:15 ` [PATCH bpf-next v5 15/34] bpf: memcg-based memory accounting for bpf local storage maps Roman Gushchin
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Roman Gushchin @ 2020-11-12 22:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, netdev, Andrii Nakryiko,
	Shakeel Butt, linux-mm, linux-kernel, kernel-team,
	Roman Gushchin

This patch enables memcg-based memory accounting for memory allocated
by __bpf_map_area_alloc(), which is used by many types of bpf maps for
large memory allocations.

Following patches in the series will refine the accounting for
some of the map types.

Signed-off-by: Roman Gushchin <guro@fb.com>
---
 kernel/bpf/syscall.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 2d77fc2496da..fcadf953989f 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -280,7 +280,7 @@ static void *__bpf_map_area_alloc(u64 size, int numa_node, bool mmapable)
 	 * __GFP_RETRY_MAYFAIL to avoid such situations.
 	 */
 
-	const gfp_t gfp = __GFP_NOWARN | __GFP_ZERO;
+	const gfp_t gfp = __GFP_NOWARN | __GFP_ZERO | __GFP_ACCOUNT;
 	unsigned int flags = 0;
 	unsigned long align = 1;
 	void *area;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v5 15/34] bpf: memcg-based memory accounting for bpf local storage maps
  2020-11-12 22:15 [PATCH bpf-next v5 00/34] bpf: switch to memcg-based memory accounting Roman Gushchin
                   ` (2 preceding siblings ...)
  2020-11-12 22:15 ` [PATCH bpf-next v5 07/34] bpf: " Roman Gushchin
@ 2020-11-12 22:15 ` Roman Gushchin
  2020-11-13 18:07   ` Song Liu
  2020-11-12 22:15 ` [PATCH bpf-next v5 31/34] bpf: eliminate rlimit-based " Roman Gushchin
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Roman Gushchin @ 2020-11-12 22:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, netdev, Andrii Nakryiko,
	Shakeel Butt, linux-mm, linux-kernel, kernel-team,
	Roman Gushchin

Account memory used by bpf local storage maps:
per-socket and per-inode storages.

Signed-off-by: Roman Gushchin <guro@fb.com>
---
 kernel/bpf/bpf_local_storage.c | 7 ++++---
 net/core/bpf_sk_storage.c      | 2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index 5d3a7af9ba9b..fd4f9ac1d042 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -67,7 +67,8 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
 	if (charge_mem && mem_charge(smap, owner, smap->elem_size))
 		return NULL;
 
-	selem = kzalloc(smap->elem_size, GFP_ATOMIC | __GFP_NOWARN);
+	selem = kzalloc(smap->elem_size, GFP_ATOMIC | __GFP_NOWARN |
+			__GFP_ACCOUNT);
 	if (selem) {
 		if (value)
 			memcpy(SDATA(selem)->data, value, smap->map.value_size);
@@ -546,7 +547,7 @@ struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr)
 	u64 cost;
 	int ret;
 
-	smap = kzalloc(sizeof(*smap), GFP_USER | __GFP_NOWARN);
+	smap = kzalloc(sizeof(*smap), GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
 	if (!smap)
 		return ERR_PTR(-ENOMEM);
 	bpf_map_init_from_attr(&smap->map, attr);
@@ -564,7 +565,7 @@ struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr)
 	}
 
 	smap->buckets = kvcalloc(sizeof(*smap->buckets), nbuckets,
-				 GFP_USER | __GFP_NOWARN);
+				 GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
 	if (!smap->buckets) {
 		bpf_map_charge_finish(&smap->map.memory);
 		kfree(smap);
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index c907f0dc7f87..1d9704bb2eca 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -453,7 +453,7 @@ bpf_sk_storage_diag_alloc(const struct nlattr *nla_stgs)
 	}
 
 	diag = kzalloc(sizeof(*diag) + sizeof(diag->maps[0]) * nr_maps,
-		       GFP_KERNEL);
+		       GFP_KERNEL_ACCOUNT);
 	if (!diag)
 		return ERR_PTR(-ENOMEM);
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v5 31/34] bpf: eliminate rlimit-based memory accounting for bpf local storage maps
  2020-11-12 22:15 [PATCH bpf-next v5 00/34] bpf: switch to memcg-based memory accounting Roman Gushchin
                   ` (3 preceding siblings ...)
  2020-11-12 22:15 ` [PATCH bpf-next v5 15/34] bpf: memcg-based memory accounting for bpf local storage maps Roman Gushchin
@ 2020-11-12 22:15 ` Roman Gushchin
  2020-11-13 18:14   ` Song Liu
  2020-11-12 22:15 ` [PATCH bpf-next v5 32/34] bpf: eliminate rlimit-based memory accounting infra for bpf maps Roman Gushchin
       [not found] ` <20201112221543.3621014-2-guro@fb.com>
  6 siblings, 1 reply; 27+ messages in thread
From: Roman Gushchin @ 2020-11-12 22:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, netdev, Andrii Nakryiko,
	Shakeel Butt, linux-mm, linux-kernel, kernel-team,
	Roman Gushchin

Do not use rlimit-based memory accounting for bpf local storage maps.
It has been replaced with the memcg-based memory accounting.

Signed-off-by: Roman Gushchin <guro@fb.com>
---
 kernel/bpf/bpf_local_storage.c | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index fd4f9ac1d042..3b0da5a04d55 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -544,8 +544,6 @@ struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr)
 	struct bpf_local_storage_map *smap;
 	unsigned int i;
 	u32 nbuckets;
-	u64 cost;
-	int ret;
 
 	smap = kzalloc(sizeof(*smap), GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
 	if (!smap)
@@ -556,18 +554,9 @@ struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr)
 	/* Use at least 2 buckets, select_bucket() is undefined behavior with 1 bucket */
 	nbuckets = max_t(u32, 2, nbuckets);
 	smap->bucket_log = ilog2(nbuckets);
-	cost = sizeof(*smap->buckets) * nbuckets + sizeof(*smap);
-
-	ret = bpf_map_charge_init(&smap->map.memory, cost);
-	if (ret < 0) {
-		kfree(smap);
-		return ERR_PTR(ret);
-	}
-
 	smap->buckets = kvcalloc(sizeof(*smap->buckets), nbuckets,
 				 GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT);
 	if (!smap->buckets) {
-		bpf_map_charge_finish(&smap->map.memory);
 		kfree(smap);
 		return ERR_PTR(-ENOMEM);
 	}
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v5 32/34] bpf: eliminate rlimit-based memory accounting infra for bpf maps
  2020-11-12 22:15 [PATCH bpf-next v5 00/34] bpf: switch to memcg-based memory accounting Roman Gushchin
                   ` (4 preceding siblings ...)
  2020-11-12 22:15 ` [PATCH bpf-next v5 31/34] bpf: eliminate rlimit-based " Roman Gushchin
@ 2020-11-12 22:15 ` Roman Gushchin
  2020-11-13 18:17   ` Song Liu
       [not found] ` <20201112221543.3621014-2-guro@fb.com>
  6 siblings, 1 reply; 27+ messages in thread
From: Roman Gushchin @ 2020-11-12 22:15 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, netdev, Andrii Nakryiko,
	Shakeel Butt, linux-mm, linux-kernel, kernel-team,
	Roman Gushchin

Remove rlimit-based accounting infrastructure code, which is not used
anymore.

Signed-off-by: Roman Gushchin <guro@fb.com>
---
 include/linux/bpf.h                           | 12 ----
 kernel/bpf/syscall.c                          | 64 +------------------
 .../selftests/bpf/progs/bpf_iter_bpf_map.c    |  2 +-
 .../selftests/bpf/progs/map_ptr_kern.c        |  7 --
 4 files changed, 3 insertions(+), 82 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 1d6e7b125877..6f1ef8a1e25f 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -136,11 +136,6 @@ struct bpf_map_ops {
 	const struct bpf_iter_seq_info *iter_seq_info;
 };
 
-struct bpf_map_memory {
-	u32 pages;
-	struct user_struct *user;
-};
-
 struct bpf_map {
 	/* The first two cachelines with read-mostly members of which some
 	 * are also accessed in fast-path (e.g. ops, max_entries).
@@ -161,7 +156,6 @@ struct bpf_map {
 	u32 btf_key_type_id;
 	u32 btf_value_type_id;
 	struct btf *btf;
-	struct bpf_map_memory memory;
 #ifdef CONFIG_MEMCG_KMEM
 	struct mem_cgroup *memcg;
 #endif
@@ -1222,12 +1216,6 @@ void bpf_map_inc_with_uref(struct bpf_map *map);
 struct bpf_map * __must_check bpf_map_inc_not_zero(struct bpf_map *map);
 void bpf_map_put_with_uref(struct bpf_map *map);
 void bpf_map_put(struct bpf_map *map);
-int bpf_map_charge_memlock(struct bpf_map *map, u32 pages);
-void bpf_map_uncharge_memlock(struct bpf_map *map, u32 pages);
-int bpf_map_charge_init(struct bpf_map_memory *mem, u64 size);
-void bpf_map_charge_finish(struct bpf_map_memory *mem);
-void bpf_map_charge_move(struct bpf_map_memory *dst,
-			 struct bpf_map_memory *src);
 void *bpf_map_area_alloc(u64 size, int numa_node);
 void *bpf_map_area_mmapable_alloc(u64 size, int numa_node);
 void bpf_map_area_free(void *base);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index fcadf953989f..9f41edbae3f8 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -359,60 +359,6 @@ static void bpf_uncharge_memlock(struct user_struct *user, u32 pages)
 		atomic_long_sub(pages, &user->locked_vm);
 }
 
-int bpf_map_charge_init(struct bpf_map_memory *mem, u64 size)
-{
-	u32 pages = round_up(size, PAGE_SIZE) >> PAGE_SHIFT;
-	struct user_struct *user;
-	int ret;
-
-	if (size >= U32_MAX - PAGE_SIZE)
-		return -E2BIG;
-
-	user = get_current_user();
-	ret = bpf_charge_memlock(user, pages);
-	if (ret) {
-		free_uid(user);
-		return ret;
-	}
-
-	mem->pages = pages;
-	mem->user = user;
-
-	return 0;
-}
-
-void bpf_map_charge_finish(struct bpf_map_memory *mem)
-{
-	bpf_uncharge_memlock(mem->user, mem->pages);
-	free_uid(mem->user);
-}
-
-void bpf_map_charge_move(struct bpf_map_memory *dst,
-			 struct bpf_map_memory *src)
-{
-	*dst = *src;
-
-	/* Make sure src will not be used for the redundant uncharging. */
-	memset(src, 0, sizeof(struct bpf_map_memory));
-}
-
-int bpf_map_charge_memlock(struct bpf_map *map, u32 pages)
-{
-	int ret;
-
-	ret = bpf_charge_memlock(map->memory.user, pages);
-	if (ret)
-		return ret;
-	map->memory.pages += pages;
-	return ret;
-}
-
-void bpf_map_uncharge_memlock(struct bpf_map *map, u32 pages)
-{
-	bpf_uncharge_memlock(map->memory.user, pages);
-	map->memory.pages -= pages;
-}
-
 static int bpf_map_alloc_id(struct bpf_map *map)
 {
 	int id;
@@ -482,14 +428,11 @@ static void bpf_map_release_memcg(struct bpf_map *map)
 static void bpf_map_free_deferred(struct work_struct *work)
 {
 	struct bpf_map *map = container_of(work, struct bpf_map, work);
-	struct bpf_map_memory mem;
 
-	bpf_map_charge_move(&mem, &map->memory);
 	security_bpf_map_free(map);
 	bpf_map_release_memcg(map);
 	/* implementation dependent freeing */
 	map->ops->map_free(map);
-	bpf_map_charge_finish(&mem);
 }
 
 static void bpf_map_put_uref(struct bpf_map *map)
@@ -568,7 +511,7 @@ static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
 		   "value_size:\t%u\n"
 		   "max_entries:\t%u\n"
 		   "map_flags:\t%#x\n"
-		   "memlock:\t%llu\n"
+		   "memlock:\t%llu\n" /* deprecated */
 		   "map_id:\t%u\n"
 		   "frozen:\t%u\n",
 		   map->map_type,
@@ -576,7 +519,7 @@ static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
 		   map->value_size,
 		   map->max_entries,
 		   map->map_flags,
-		   map->memory.pages * 1ULL << PAGE_SHIFT,
+		   0LLU,
 		   map->id,
 		   READ_ONCE(map->frozen));
 	if (type) {
@@ -819,7 +762,6 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 static int map_create(union bpf_attr *attr)
 {
 	int numa_node = bpf_map_attr_numa_node(attr);
-	struct bpf_map_memory mem;
 	struct bpf_map *map;
 	int f_flags;
 	int err;
@@ -918,9 +860,7 @@ static int map_create(union bpf_attr *attr)
 	security_bpf_map_free(map);
 free_map:
 	btf_put(map->btf);
-	bpf_map_charge_move(&mem, &map->memory);
 	map->ops->map_free(map);
-	bpf_map_charge_finish(&mem);
 	return err;
 }
 
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_map.c b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_map.c
index 08651b23edba..b83b5d2e17dc 100644
--- a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_map.c
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_map.c
@@ -23,6 +23,6 @@ int dump_bpf_map(struct bpf_iter__bpf_map *ctx)
 
 	BPF_SEQ_PRINTF(seq, "%8u %8ld %8ld %10lu\n", map->id, map->refcnt.counter,
 		       map->usercnt.counter,
-		       map->memory.user->locked_vm.counter);
+		       0LLU);
 	return 0;
 }
diff --git a/tools/testing/selftests/bpf/progs/map_ptr_kern.c b/tools/testing/selftests/bpf/progs/map_ptr_kern.c
index c325405751e2..d8850bc6a9f1 100644
--- a/tools/testing/selftests/bpf/progs/map_ptr_kern.c
+++ b/tools/testing/selftests/bpf/progs/map_ptr_kern.c
@@ -26,17 +26,12 @@ __u32 g_line = 0;
 		return 0;	\
 })
 
-struct bpf_map_memory {
-	__u32 pages;
-} __attribute__((preserve_access_index));
-
 struct bpf_map {
 	enum bpf_map_type map_type;
 	__u32 key_size;
 	__u32 value_size;
 	__u32 max_entries;
 	__u32 id;
-	struct bpf_map_memory memory;
 } __attribute__((preserve_access_index));
 
 static inline int check_bpf_map_fields(struct bpf_map *map, __u32 key_size,
@@ -47,7 +42,6 @@ static inline int check_bpf_map_fields(struct bpf_map *map, __u32 key_size,
 	VERIFY(map->value_size == value_size);
 	VERIFY(map->max_entries == max_entries);
 	VERIFY(map->id > 0);
-	VERIFY(map->memory.pages > 0);
 
 	return 1;
 }
@@ -60,7 +54,6 @@ static inline int check_bpf_map_ptr(struct bpf_map *indirect,
 	VERIFY(indirect->value_size == direct->value_size);
 	VERIFY(indirect->max_entries == direct->max_entries);
 	VERIFY(indirect->id == direct->id);
-	VERIFY(indirect->memory.pages == direct->memory.pages);
 
 	return 1;
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v5 01/34] mm: memcontrol: use helpers to read page's memcg data
       [not found] ` <20201112221543.3621014-2-guro@fb.com>
@ 2020-11-12 22:56   ` Stephen Rothwell
  2020-11-13  0:26     ` Roman Gushchin
  0 siblings, 1 reply; 27+ messages in thread
From: Stephen Rothwell @ 2020-11-12 22:56 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, netdev,
	Andrii Nakryiko, Shakeel Butt, linux-mm, linux-kernel,
	kernel-team, Johannes Weiner, Michal Hocko, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 1674 bytes --]

Hi Roman,

On Thu, 12 Nov 2020 14:15:10 -0800 Roman Gushchin <guro@fb.com> wrote:
>
> Patch series "mm: allow mapping accounted kernel pages to userspace", v6.
> 
> Currently a non-slab kernel page which has been charged to a memory cgroup
> can't be mapped to userspace.  The underlying reason is simple: PageKmemcg
> flag is defined as a page type (like buddy, offline, etc), so it takes a
> bit from a page->mapped counter.  Pages with a type set can't be mapped to
> userspace.
>
.....
> 
> To make sure nobody uses a direct access, struct page's
> mem_cgroup/obj_cgroups is converted to unsigned long memcg_data.
> 
> Link: https://lkml.kernel.org/r/20201027001657.3398190-1-guro@fb.com
> Link: https://lkml.kernel.org/r/20201027001657.3398190-2-guro@fb.com
> Signed-off-by: Roman Gushchin <guro@fb.com>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> Reviewed-by: Shakeel Butt <shakeelb@google.com>
> Acked-by: Michal Hocko <mhocko@suse.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

What is going on here?  You are taking patches from linux-next and
submitting them to another maintainer?  Why?

You should not do that from Andrew's tree as it changes/rebases every
so often ... and you should not have my SOB on there as it is only
there because that patch is in linux-next i.e. I in the submission
chain to linux-next - if the patch is to go via some other tree, then
my SOB should not be there.  (The same may be true for Andrew's SOB.)
In general you cannot add someone else's SOB to one of your patch
submissions.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v5 01/34] mm: memcontrol: use helpers to read page's memcg data
  2020-11-12 22:56   ` [PATCH bpf-next v5 01/34] mm: memcontrol: use helpers to read page's memcg data Stephen Rothwell
@ 2020-11-13  0:26     ` Roman Gushchin
  2020-11-13  3:04       ` Alexei Starovoitov
  0 siblings, 1 reply; 27+ messages in thread
From: Roman Gushchin @ 2020-11-13  0:26 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, netdev,
	Andrii Nakryiko, Shakeel Butt, linux-mm, linux-kernel,
	kernel-team, Johannes Weiner, Michal Hocko, Andrew Morton

On Fri, Nov 13, 2020 at 09:56:32AM +1100, Stephen Rothwell wrote:
> Hi Roman,
> 
> On Thu, 12 Nov 2020 14:15:10 -0800 Roman Gushchin <guro@fb.com> wrote:
> >
> > Patch series "mm: allow mapping accounted kernel pages to userspace", v6.
> > 
> > Currently a non-slab kernel page which has been charged to a memory cgroup
> > can't be mapped to userspace.  The underlying reason is simple: PageKmemcg
> > flag is defined as a page type (like buddy, offline, etc), so it takes a
> > bit from a page->mapped counter.  Pages with a type set can't be mapped to
> > userspace.
> >
> .....
> > 
> > To make sure nobody uses a direct access, struct page's
> > mem_cgroup/obj_cgroups is converted to unsigned long memcg_data.
> > 
> > Link: https://lkml.kernel.org/r/20201027001657.3398190-1-guro@fb.com
> > Link: https://lkml.kernel.org/r/20201027001657.3398190-2-guro@fb.com
> > Signed-off-by: Roman Gushchin <guro@fb.com>
> > Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> > Reviewed-by: Shakeel Butt <shakeelb@google.com>
> > Acked-by: Michal Hocko <mhocko@suse.com>
> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> > Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
> 
> What is going on here?  You are taking patches from linux-next and
> submitting them to another maintainer?  Why?

Hi Stephen!

These patches are not intended to be merged through the bpf tree.
They are included into the patchset to make bpf selftests pass and for
informational purposes.
It's written in the cover letter.

> 
> You should not do that from Andrew's tree as it changes/rebases every
> so often ... and you should not have my SOB on there as it is only
> there because that patch is in linux-next i.e. I in the submission
> chain to linux-next - if the patch is to go via some other tree, then
> my SOB should not be there.  (The same may be true for Andrew's SOB.)
> In general you cannot add someone else's SOB to one of your patch
> submissions.

I'm sorry for the confusion.

Maybe I had to just list their titles in the cover letter. Idk what's
the best option for such cross-subsystem dependencies.

Thanks!

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v5 01/34] mm: memcontrol: use helpers to read page's memcg data
  2020-11-13  0:26     ` Roman Gushchin
@ 2020-11-13  3:04       ` Alexei Starovoitov
  2020-11-13  3:18         ` Andrew Morton
  0 siblings, 1 reply; 27+ messages in thread
From: Alexei Starovoitov @ 2020-11-13  3:04 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Stephen Rothwell, bpf, Alexei Starovoitov, Daniel Borkmann,
	netdev, Andrii Nakryiko, Shakeel Butt, linux-mm, linux-kernel,
	kernel-team, Johannes Weiner, Michal Hocko, Andrew Morton

On Thu, Nov 12, 2020 at 04:26:10PM -0800, Roman Gushchin wrote:
> 
> These patches are not intended to be merged through the bpf tree.
> They are included into the patchset to make bpf selftests pass and for
> informational purposes.
> It's written in the cover letter.
...
> Maybe I had to just list their titles in the cover letter. Idk what's
> the best option for such cross-subsystem dependencies.

We had several situations in the past releases where dependent patches
were merged into multiple trees. For that to happen cleanly from git pov
one of the maintainers need to create a stable branch/tag and let other
maintainers pull that branch into different trees. This way the sha-s
stay the same and no conflicts arise during the merge window.
In this case sounds like the first 4 patches are in mm tree already.
Is there a branch/tag I can pull to get the first 4 into bpf-next?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v5 01/34] mm: memcontrol: use helpers to read page's memcg data
  2020-11-13  3:04       ` Alexei Starovoitov
@ 2020-11-13  3:18         ` Andrew Morton
  2020-11-13  3:25           ` Alexei Starovoitov
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Morton @ 2020-11-13  3:18 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Roman Gushchin, Stephen Rothwell, bpf, Alexei Starovoitov,
	Daniel Borkmann, netdev, Andrii Nakryiko, Shakeel Butt, linux-mm,
	linux-kernel, kernel-team, Johannes Weiner, Michal Hocko

On Thu, 12 Nov 2020 19:04:56 -0800 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Thu, Nov 12, 2020 at 04:26:10PM -0800, Roman Gushchin wrote:
> > 
> > These patches are not intended to be merged through the bpf tree.
> > They are included into the patchset to make bpf selftests pass and for
> > informational purposes.
> > It's written in the cover letter.
> ...
> > Maybe I had to just list their titles in the cover letter. Idk what's
> > the best option for such cross-subsystem dependencies.
> 
> We had several situations in the past releases where dependent patches
> were merged into multiple trees. For that to happen cleanly from git pov
> one of the maintainers need to create a stable branch/tag and let other
> maintainers pull that branch into different trees. This way the sha-s
> stay the same and no conflicts arise during the merge window.
> In this case sounds like the first 4 patches are in mm tree already.
> Is there a branch/tag I can pull to get the first 4 into bpf-next?

Not really, at present.  This is largely by design, although it does cause
this problem once or twice a year.

These four patches:

mm-memcontrol-use-helpers-to-read-pages-memcg-data.patch
mm-memcontrol-slab-use-helpers-to-access-slab-pages-memcg_data.patch
mm-introduce-page-memcg-flags.patch
mm-convert-page-kmemcg-type-to-a-page-memcg-flag.patch

are sufficiently reviewed - please pull them into the bpf tree when
convenient.  Once they hit linux-next, I'll drop the -mm copies and the
bpf tree maintainers will then be responsible for whether & when they
get upstream.  


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v5 01/34] mm: memcontrol: use helpers to read page's memcg data
  2020-11-13  3:18         ` Andrew Morton
@ 2020-11-13  3:25           ` Alexei Starovoitov
  2020-11-13  3:40             ` Andrew Morton
  2020-11-13  4:01             ` Roman Gushchin
  0 siblings, 2 replies; 27+ messages in thread
From: Alexei Starovoitov @ 2020-11-13  3:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Roman Gushchin, Stephen Rothwell, bpf, Alexei Starovoitov,
	Daniel Borkmann, Network Development, Andrii Nakryiko,
	Shakeel Butt, linux-mm, LKML, Kernel Team, Johannes Weiner,
	Michal Hocko

On Thu, Nov 12, 2020 at 7:18 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Thu, 12 Nov 2020 19:04:56 -0800 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>
> > On Thu, Nov 12, 2020 at 04:26:10PM -0800, Roman Gushchin wrote:
> > >
> > > These patches are not intended to be merged through the bpf tree.
> > > They are included into the patchset to make bpf selftests pass and for
> > > informational purposes.
> > > It's written in the cover letter.
> > ...
> > > Maybe I had to just list their titles in the cover letter. Idk what's
> > > the best option for such cross-subsystem dependencies.
> >
> > We had several situations in the past releases where dependent patches
> > were merged into multiple trees. For that to happen cleanly from git pov
> > one of the maintainers need to create a stable branch/tag and let other
> > maintainers pull that branch into different trees. This way the sha-s
> > stay the same and no conflicts arise during the merge window.
> > In this case sounds like the first 4 patches are in mm tree already.
> > Is there a branch/tag I can pull to get the first 4 into bpf-next?
>
> Not really, at present.  This is largely by design, although it does cause
> this problem once or twice a year.
>
> These four patches:
>
> mm-memcontrol-use-helpers-to-read-pages-memcg-data.patch
> mm-memcontrol-slab-use-helpers-to-access-slab-pages-memcg_data.patch
> mm-introduce-page-memcg-flags.patch
> mm-convert-page-kmemcg-type-to-a-page-memcg-flag.patch
>
> are sufficiently reviewed - please pull them into the bpf tree when
> convenient.  Once they hit linux-next, I'll drop the -mm copies and the
> bpf tree maintainers will then be responsible for whether & when they
> get upstream.

That's certainly an option if they don't depend on other patches in the mm tree.
Roman probably knows best ?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v5 01/34] mm: memcontrol: use helpers to read page's memcg data
  2020-11-13  3:25           ` Alexei Starovoitov
@ 2020-11-13  3:40             ` Andrew Morton
  2020-11-13  4:08               ` Alexei Starovoitov
  2020-11-13  4:01             ` Roman Gushchin
  1 sibling, 1 reply; 27+ messages in thread
From: Andrew Morton @ 2020-11-13  3:40 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Roman Gushchin, Stephen Rothwell, bpf, Alexei Starovoitov,
	Daniel Borkmann, Network Development, Andrii Nakryiko,
	Shakeel Butt, linux-mm, LKML, Kernel Team, Johannes Weiner,
	Michal Hocko

On Thu, 12 Nov 2020 19:25:48 -0800 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Thu, Nov 12, 2020 at 7:18 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > On Thu, 12 Nov 2020 19:04:56 -0800 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> >
> > > On Thu, Nov 12, 2020 at 04:26:10PM -0800, Roman Gushchin wrote:
> > > >
> > > > These patches are not intended to be merged through the bpf tree.
> > > > They are included into the patchset to make bpf selftests pass and for
> > > > informational purposes.
> > > > It's written in the cover letter.
> > > ...
> > > > Maybe I had to just list their titles in the cover letter. Idk what's
> > > > the best option for such cross-subsystem dependencies.
> > >
> > > We had several situations in the past releases where dependent patches
> > > were merged into multiple trees. For that to happen cleanly from git pov
> > > one of the maintainers need to create a stable branch/tag and let other
> > > maintainers pull that branch into different trees. This way the sha-s
> > > stay the same and no conflicts arise during the merge window.
> > > In this case sounds like the first 4 patches are in mm tree already.
> > > Is there a branch/tag I can pull to get the first 4 into bpf-next?
> >
> > Not really, at present.  This is largely by design, although it does cause
> > this problem once or twice a year.
> >
> > These four patches:
> >
> > mm-memcontrol-use-helpers-to-read-pages-memcg-data.patch
> > mm-memcontrol-slab-use-helpers-to-access-slab-pages-memcg_data.patch
> > mm-introduce-page-memcg-flags.patch
> > mm-convert-page-kmemcg-type-to-a-page-memcg-flag.patch
> >
> > are sufficiently reviewed - please pull them into the bpf tree when
> > convenient.  Once they hit linux-next, I'll drop the -mm copies and the
> > bpf tree maintainers will then be responsible for whether & when they
> > get upstream.
> 
> That's certainly an option if they don't depend on other patches in the mm tree.
> Roman probably knows best ?

That should be OK.  They apply and compile ;)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v5 01/34] mm: memcontrol: use helpers to read page's memcg data
  2020-11-13  3:25           ` Alexei Starovoitov
  2020-11-13  3:40             ` Andrew Morton
@ 2020-11-13  4:01             ` Roman Gushchin
  2020-11-13 14:25               ` Shakeel Butt
  1 sibling, 1 reply; 27+ messages in thread
From: Roman Gushchin @ 2020-11-13  4:01 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrew Morton, Stephen Rothwell, bpf, Alexei Starovoitov,
	Daniel Borkmann, Network Development, Andrii Nakryiko,
	Shakeel Butt, linux-mm, LKML, Kernel Team, Johannes Weiner,
	Michal Hocko

On Thu, Nov 12, 2020 at 07:25:48PM -0800, Alexei Starovoitov wrote:
> On Thu, Nov 12, 2020 at 7:18 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > On Thu, 12 Nov 2020 19:04:56 -0800 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> >
> > > On Thu, Nov 12, 2020 at 04:26:10PM -0800, Roman Gushchin wrote:
> > > >
> > > > These patches are not intended to be merged through the bpf tree.
> > > > They are included into the patchset to make bpf selftests pass and for
> > > > informational purposes.
> > > > It's written in the cover letter.
> > > ...
> > > > Maybe I had to just list their titles in the cover letter. Idk what's
> > > > the best option for such cross-subsystem dependencies.
> > >
> > > We had several situations in the past releases where dependent patches
> > > were merged into multiple trees. For that to happen cleanly from git pov
> > > one of the maintainers need to create a stable branch/tag and let other
> > > maintainers pull that branch into different trees. This way the sha-s
> > > stay the same and no conflicts arise during the merge window.
> > > In this case sounds like the first 4 patches are in mm tree already.
> > > Is there a branch/tag I can pull to get the first 4 into bpf-next?
> >
> > Not really, at present.  This is largely by design, although it does cause
> > this problem once or twice a year.
> >
> > These four patches:
> >
> > mm-memcontrol-use-helpers-to-read-pages-memcg-data.patch
> > mm-memcontrol-slab-use-helpers-to-access-slab-pages-memcg_data.patch
> > mm-introduce-page-memcg-flags.patch
> > mm-convert-page-kmemcg-type-to-a-page-memcg-flag.patch
> >
> > are sufficiently reviewed - please pull them into the bpf tree when
> > convenient.  Once they hit linux-next, I'll drop the -mm copies and the
> > bpf tree maintainers will then be responsible for whether & when they
> > get upstream.
> 
> That's certainly an option if they don't depend on other patches in the mm tree.
> Roman probably knows best ?

Yes, they are self-contained and don't depend on any patches in the mm tree.

Thanks!

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v5 01/34] mm: memcontrol: use helpers to read page's memcg data
  2020-11-13  3:40             ` Andrew Morton
@ 2020-11-13  4:08               ` Alexei Starovoitov
  0 siblings, 0 replies; 27+ messages in thread
From: Alexei Starovoitov @ 2020-11-13  4:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Roman Gushchin, Stephen Rothwell, bpf, Alexei Starovoitov,
	Daniel Borkmann, Network Development, Andrii Nakryiko,
	Shakeel Butt, linux-mm, LKML, Kernel Team, Johannes Weiner,
	Michal Hocko

On Thu, Nov 12, 2020 at 7:40 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Thu, 12 Nov 2020 19:25:48 -0800 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>
> > On Thu, Nov 12, 2020 at 7:18 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > >
> > > On Thu, 12 Nov 2020 19:04:56 -0800 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> > >
> > > > On Thu, Nov 12, 2020 at 04:26:10PM -0800, Roman Gushchin wrote:
> > > > >
> > > > > These patches are not intended to be merged through the bpf tree.
> > > > > They are included into the patchset to make bpf selftests pass and for
> > > > > informational purposes.
> > > > > It's written in the cover letter.
> > > > ...
> > > > > Maybe I had to just list their titles in the cover letter. Idk what's
> > > > > the best option for such cross-subsystem dependencies.
> > > >
> > > > We had several situations in the past releases where dependent patches
> > > > were merged into multiple trees. For that to happen cleanly from git pov
> > > > one of the maintainers need to create a stable branch/tag and let other
> > > > maintainers pull that branch into different trees. This way the sha-s
> > > > stay the same and no conflicts arise during the merge window.
> > > > In this case sounds like the first 4 patches are in mm tree already.
> > > > Is there a branch/tag I can pull to get the first 4 into bpf-next?
> > >
> > > Not really, at present.  This is largely by design, although it does cause
> > > this problem once or twice a year.
> > >
> > > These four patches:
> > >
> > > mm-memcontrol-use-helpers-to-read-pages-memcg-data.patch
> > > mm-memcontrol-slab-use-helpers-to-access-slab-pages-memcg_data.patch
> > > mm-introduce-page-memcg-flags.patch
> > > mm-convert-page-kmemcg-type-to-a-page-memcg-flag.patch
> > >
> > > are sufficiently reviewed - please pull them into the bpf tree when
> > > convenient.  Once they hit linux-next, I'll drop the -mm copies and the
> > > bpf tree maintainers will then be responsible for whether & when they
> > > get upstream.
> >
> > That's certainly an option if they don't depend on other patches in the mm tree.
> > Roman probably knows best ?
>
> That should be OK.  They apply and compile ;)

Awesome. Thank you both for confirming.
Will take them as soon as the rest of the set is reviewed.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v5 01/34] mm: memcontrol: use helpers to read page's memcg data
  2020-11-13  4:01             ` Roman Gushchin
@ 2020-11-13 14:25               ` Shakeel Butt
  2020-11-13 17:18                 ` Roman Gushchin
  0 siblings, 1 reply; 27+ messages in thread
From: Shakeel Butt @ 2020-11-13 14:25 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Alexei Starovoitov, Andrew Morton, Stephen Rothwell, bpf,
	Alexei Starovoitov, Daniel Borkmann, Network Development,
	Andrii Nakryiko, linux-mm, LKML, Kernel Team, Johannes Weiner,
	Michal Hocko

On Thu, Nov 12, 2020 at 8:02 PM Roman Gushchin <guro@fb.com> wrote:
>
> On Thu, Nov 12, 2020 at 07:25:48PM -0800, Alexei Starovoitov wrote:
> > On Thu, Nov 12, 2020 at 7:18 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > >
> > > On Thu, 12 Nov 2020 19:04:56 -0800 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> > >
> > > > On Thu, Nov 12, 2020 at 04:26:10PM -0800, Roman Gushchin wrote:
> > > > >
> > > > > These patches are not intended to be merged through the bpf tree.
> > > > > They are included into the patchset to make bpf selftests pass and for
> > > > > informational purposes.
> > > > > It's written in the cover letter.
> > > > ...
> > > > > Maybe I had to just list their titles in the cover letter. Idk what's
> > > > > the best option for such cross-subsystem dependencies.
> > > >
> > > > We had several situations in the past releases where dependent patches
> > > > were merged into multiple trees. For that to happen cleanly from git pov
> > > > one of the maintainers need to create a stable branch/tag and let other
> > > > maintainers pull that branch into different trees. This way the sha-s
> > > > stay the same and no conflicts arise during the merge window.
> > > > In this case sounds like the first 4 patches are in mm tree already.
> > > > Is there a branch/tag I can pull to get the first 4 into bpf-next?
> > >
> > > Not really, at present.  This is largely by design, although it does cause
> > > this problem once or twice a year.
> > >
> > > These four patches:
> > >
> > > mm-memcontrol-use-helpers-to-read-pages-memcg-data.patch
> > > mm-memcontrol-slab-use-helpers-to-access-slab-pages-memcg_data.patch
> > > mm-introduce-page-memcg-flags.patch
> > > mm-convert-page-kmemcg-type-to-a-page-memcg-flag.patch
> > >
> > > are sufficiently reviewed - please pull them into the bpf tree when
> > > convenient.  Once they hit linux-next, I'll drop the -mm copies and the
> > > bpf tree maintainers will then be responsible for whether & when they
> > > get upstream.
> >
> > That's certainly an option if they don't depend on other patches in the mm tree.
> > Roman probably knows best ?
>
> Yes, they are self-contained and don't depend on any patches in the mm tree.
>

The patch "mm, kvm: account kvm_vcpu_mmap to kmemcg" in mm tree
depends on that series.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v5 01/34] mm: memcontrol: use helpers to read page's memcg data
  2020-11-13 14:25               ` Shakeel Butt
@ 2020-11-13 17:18                 ` Roman Gushchin
  0 siblings, 0 replies; 27+ messages in thread
From: Roman Gushchin @ 2020-11-13 17:18 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Alexei Starovoitov, Andrew Morton, Stephen Rothwell, bpf,
	Alexei Starovoitov, Daniel Borkmann, Network Development,
	Andrii Nakryiko, linux-mm, LKML, Kernel Team, Johannes Weiner,
	Michal Hocko

On Fri, Nov 13, 2020 at 06:25:53AM -0800, Shakeel Butt wrote:
> On Thu, Nov 12, 2020 at 8:02 PM Roman Gushchin <guro@fb.com> wrote:
> >
> > On Thu, Nov 12, 2020 at 07:25:48PM -0800, Alexei Starovoitov wrote:
> > > On Thu, Nov 12, 2020 at 7:18 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > > >
> > > > On Thu, 12 Nov 2020 19:04:56 -0800 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > > On Thu, Nov 12, 2020 at 04:26:10PM -0800, Roman Gushchin wrote:
> > > > > >
> > > > > > These patches are not intended to be merged through the bpf tree.
> > > > > > They are included into the patchset to make bpf selftests pass and for
> > > > > > informational purposes.
> > > > > > It's written in the cover letter.
> > > > > ...
> > > > > > Maybe I had to just list their titles in the cover letter. Idk what's
> > > > > > the best option for such cross-subsystem dependencies.
> > > > >
> > > > > We had several situations in the past releases where dependent patches
> > > > > were merged into multiple trees. For that to happen cleanly from git pov
> > > > > one of the maintainers need to create a stable branch/tag and let other
> > > > > maintainers pull that branch into different trees. This way the sha-s
> > > > > stay the same and no conflicts arise during the merge window.
> > > > > In this case sounds like the first 4 patches are in mm tree already.
> > > > > Is there a branch/tag I can pull to get the first 4 into bpf-next?
> > > >
> > > > Not really, at present.  This is largely by design, although it does cause
> > > > this problem once or twice a year.
> > > >
> > > > These four patches:
> > > >
> > > > mm-memcontrol-use-helpers-to-read-pages-memcg-data.patch
> > > > mm-memcontrol-slab-use-helpers-to-access-slab-pages-memcg_data.patch
> > > > mm-introduce-page-memcg-flags.patch
> > > > mm-convert-page-kmemcg-type-to-a-page-memcg-flag.patch
> > > >
> > > > are sufficiently reviewed - please pull them into the bpf tree when
> > > > convenient.  Once they hit linux-next, I'll drop the -mm copies and the
> > > > bpf tree maintainers will then be responsible for whether & when they
> > > > get upstream.
> > >
> > > That's certainly an option if they don't depend on other patches in the mm tree.
> > > Roman probably knows best ?
> >
> > Yes, they are self-contained and don't depend on any patches in the mm tree.
> >
> 
> The patch "mm, kvm: account kvm_vcpu_mmap to kmemcg" in mm tree
> depends on that series.

True, and I believe there are (or will be) more dependencies like this.
But it should be fine, we only have to make sure that these 4 patches
will be merged first.

Thanks!

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v5 05/34] bpf: memcg-based memory accounting for bpf progs
  2020-11-12 22:15 ` [PATCH bpf-next v5 05/34] bpf: memcg-based memory accounting for bpf progs Roman Gushchin
@ 2020-11-13 17:31   ` Song Liu
  0 siblings, 0 replies; 27+ messages in thread
From: Song Liu @ 2020-11-13 17:31 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, netdev,
	Andrii Nakryiko, Shakeel Butt, linux-mm, linux-kernel,
	Kernel Team



> On Nov 12, 2020, at 2:15 PM, Roman Gushchin <guro@fb.com> wrote:
> 
> Include memory used by bpf programs into the memcg-based accounting.
> This includes the memory used by programs itself, auxiliary data,
> statistics and bpf line info. A memory cgroup containing the
> process which loads the program is getting charged.
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>

Acked-by: Song Liu <songliubraving@fb.com>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v5 06/34] bpf: prepare for memcg-based memory accounting for bpf maps
  2020-11-12 22:15 ` [PATCH bpf-next v5 06/34] bpf: prepare for memcg-based memory accounting for bpf maps Roman Gushchin
@ 2020-11-13 17:46   ` Song Liu
  2020-11-13 19:40     ` Roman Gushchin
  0 siblings, 1 reply; 27+ messages in thread
From: Song Liu @ 2020-11-13 17:46 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, netdev,
	Andrii Nakryiko, Shakeel Butt, linux-mm, linux-kernel,
	Kernel Team



> On Nov 12, 2020, at 2:15 PM, Roman Gushchin <guro@fb.com> wrote:

[...]

> 
> +#ifdef CONFIG_MEMCG_KMEM
> +static __always_inline int __bpf_map_update_elem(struct bpf_map *map, void *key,
> +						 void *value, u64 flags)
> +{
> +	struct mem_cgroup *old_memcg;
> +	bool in_interrupt;
> +	int ret;
> +
> +	/*
> +	 * If update from an interrupt context results in a memory allocation,
> +	 * the memory cgroup to charge can't be determined from the context
> +	 * of the current task. Instead, we charge the memory cgroup, which
> +	 * contained a process created the map.
> +	 */
> +	in_interrupt = in_interrupt();
> +	if (in_interrupt)
> +		old_memcg = set_active_memcg(map->memcg);

set_active_memcg() checks in_interrupt() again. Maybe we can introduce another
helper to avoid checking it twice? Something like

static inline struct mem_cgroup *
set_active_memcg_int(struct mem_cgroup *memcg)
{
        struct mem_cgroup *old;

        old = this_cpu_read(int_active_memcg);
        this_cpu_write(int_active_memcg, memcg);
        return old;
}

Thanks,
Song

[...]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v5 07/34] bpf: memcg-based memory accounting for bpf maps
  2020-11-12 22:15 ` [PATCH bpf-next v5 07/34] bpf: " Roman Gushchin
@ 2020-11-13 18:04   ` Song Liu
  0 siblings, 0 replies; 27+ messages in thread
From: Song Liu @ 2020-11-13 18:04 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, netdev,
	Andrii Nakryiko, Shakeel Butt, linux-mm, linux-kernel,
	Kernel Team



> On Nov 12, 2020, at 2:15 PM, Roman Gushchin <guro@fb.com> wrote:
> 
> This patch enables memcg-based memory accounting for memory allocated
> by __bpf_map_area_alloc(), which is used by many types of bpf maps for
> large memory allocations.
> 
> Following patches in the series will refine the accounting for
> some of the map types.
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>

Acked-by: Song Liu <songliubraving@fb.com>

> ---
> kernel/bpf/syscall.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 2d77fc2496da..fcadf953989f 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -280,7 +280,7 @@ static void *__bpf_map_area_alloc(u64 size, int numa_node, bool mmapable)
> 	 * __GFP_RETRY_MAYFAIL to avoid such situations.
> 	 */
> 
> -	const gfp_t gfp = __GFP_NOWARN | __GFP_ZERO;
> +	const gfp_t gfp = __GFP_NOWARN | __GFP_ZERO | __GFP_ACCOUNT;
> 	unsigned int flags = 0;
> 	unsigned long align = 1;
> 	void *area;
> -- 
> 2.26.2
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v5 15/34] bpf: memcg-based memory accounting for bpf local storage maps
  2020-11-12 22:15 ` [PATCH bpf-next v5 15/34] bpf: memcg-based memory accounting for bpf local storage maps Roman Gushchin
@ 2020-11-13 18:07   ` Song Liu
  0 siblings, 0 replies; 27+ messages in thread
From: Song Liu @ 2020-11-13 18:07 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, netdev,
	Andrii Nakryiko, Shakeel Butt, linux-mm, linux-kernel,
	Kernel Team



> On Nov 12, 2020, at 2:15 PM, Roman Gushchin <guro@fb.com> wrote:
> 
> Account memory used by bpf local storage maps:
> per-socket and per-inode storages.
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>

Acked-by: Song Liu <songliubraving@fb.com>

[...]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v5 31/34] bpf: eliminate rlimit-based memory accounting for bpf local storage maps
  2020-11-12 22:15 ` [PATCH bpf-next v5 31/34] bpf: eliminate rlimit-based " Roman Gushchin
@ 2020-11-13 18:14   ` Song Liu
  2020-11-13 19:33     ` Roman Gushchin
  0 siblings, 1 reply; 27+ messages in thread
From: Song Liu @ 2020-11-13 18:14 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, netdev,
	Andrii Nakryiko, Shakeel Butt, linux-mm, linux-kernel,
	Kernel Team



> On Nov 12, 2020, at 2:15 PM, Roman Gushchin <guro@fb.com> wrote:
> 
> Do not use rlimit-based memory accounting for bpf local storage maps.
> It has been replaced with the memcg-based memory accounting.
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>
> ---
> kernel/bpf/bpf_local_storage.c | 11 -----------
> 1 file changed, 11 deletions(-)
> 
> diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
> index fd4f9ac1d042..3b0da5a04d55 100644
> --- a/kernel/bpf/bpf_local_storage.c
> +++ b/kernel/bpf/bpf_local_storage.c

Do we need to change/remove mem_charge() and mem_uncharge() in 
bpf_local_storage.c? I didn't find that in the set. 

Thanks,
Song

[...]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v5 32/34] bpf: eliminate rlimit-based memory accounting infra for bpf maps
  2020-11-12 22:15 ` [PATCH bpf-next v5 32/34] bpf: eliminate rlimit-based memory accounting infra for bpf maps Roman Gushchin
@ 2020-11-13 18:17   ` Song Liu
  0 siblings, 0 replies; 27+ messages in thread
From: Song Liu @ 2020-11-13 18:17 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Networking,
	Andrii Nakryiko, Shakeel Butt, Linux MM, open list, Kernel Team



> On Nov 12, 2020, at 2:15 PM, Roman Gushchin <guro@fb.com> wrote:
> 
> Remove rlimit-based accounting infrastructure code, which is not used
> anymore.
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>

Acked-by: Song Liu <songliubraving@fb.com>

[...]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v5 31/34] bpf: eliminate rlimit-based memory accounting for bpf local storage maps
  2020-11-13 18:14   ` Song Liu
@ 2020-11-13 19:33     ` Roman Gushchin
  2020-11-13 20:53       ` Song Liu
  0 siblings, 1 reply; 27+ messages in thread
From: Roman Gushchin @ 2020-11-13 19:33 UTC (permalink / raw)
  To: Song Liu
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, netdev,
	Andrii Nakryiko, Shakeel Butt, linux-mm, linux-kernel,
	Kernel Team

On Fri, Nov 13, 2020 at 10:14:48AM -0800, Song Liu wrote:
> 
> 
> > On Nov 12, 2020, at 2:15 PM, Roman Gushchin <guro@fb.com> wrote:
> > 
> > Do not use rlimit-based memory accounting for bpf local storage maps.
> > It has been replaced with the memcg-based memory accounting.
> > 
> > Signed-off-by: Roman Gushchin <guro@fb.com>
> > ---
> > kernel/bpf/bpf_local_storage.c | 11 -----------
> > 1 file changed, 11 deletions(-)
> > 
> > diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
> > index fd4f9ac1d042..3b0da5a04d55 100644
> > --- a/kernel/bpf/bpf_local_storage.c
> > +++ b/kernel/bpf/bpf_local_storage.c
> 
> Do we need to change/remove mem_charge() and mem_uncharge() in 
> bpf_local_storage.c? I didn't find that in the set.

No, those are used for per-socket memory limits (see sk_storage_charge()
and omem_charge()).

Btw, thanks for looking into the patchset!

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v5 06/34] bpf: prepare for memcg-based memory accounting for bpf maps
  2020-11-13 17:46   ` Song Liu
@ 2020-11-13 19:40     ` Roman Gushchin
  2020-11-13 20:48       ` Song Liu
  0 siblings, 1 reply; 27+ messages in thread
From: Roman Gushchin @ 2020-11-13 19:40 UTC (permalink / raw)
  To: Song Liu
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, netdev,
	Andrii Nakryiko, Shakeel Butt, linux-mm, linux-kernel,
	Kernel Team

On Fri, Nov 13, 2020 at 09:46:49AM -0800, Song Liu wrote:
> 
> 
> > On Nov 12, 2020, at 2:15 PM, Roman Gushchin <guro@fb.com> wrote:
> 
> [...]
> 
> > 
> > +#ifdef CONFIG_MEMCG_KMEM
> > +static __always_inline int __bpf_map_update_elem(struct bpf_map *map, void *key,
> > +						 void *value, u64 flags)
> > +{
> > +	struct mem_cgroup *old_memcg;
> > +	bool in_interrupt;
> > +	int ret;
> > +
> > +	/*
> > +	 * If update from an interrupt context results in a memory allocation,
> > +	 * the memory cgroup to charge can't be determined from the context
> > +	 * of the current task. Instead, we charge the memory cgroup, which
> > +	 * contained a process created the map.
> > +	 */
> > +	in_interrupt = in_interrupt();
> > +	if (in_interrupt)
> > +		old_memcg = set_active_memcg(map->memcg);
> 
> set_active_memcg() checks in_interrupt() again. Maybe we can introduce another
> helper to avoid checking it twice? Something like
> 
> static inline struct mem_cgroup *
> set_active_memcg_int(struct mem_cgroup *memcg)
> {
>         struct mem_cgroup *old;
> 
>         old = this_cpu_read(int_active_memcg);
>         this_cpu_write(int_active_memcg, memcg);
>         return old;
> }

Yeah, it's a good idea!

in_interrupt() check is very cheap (like checking some bits in a per-cpu variable),
so I don't think there will be any measurable difference. So I suggest to implement
it later as an enhancement on top (maybe in the next merge window), to avoid an another
delay. Otherwise I'll need to send a patch to mm@, wait for reviews and an inclusion
into the mm tree, etc). Does it work for you?

Thanks!

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v5 06/34] bpf: prepare for memcg-based memory accounting for bpf maps
  2020-11-13 19:40     ` Roman Gushchin
@ 2020-11-13 20:48       ` Song Liu
  0 siblings, 0 replies; 27+ messages in thread
From: Song Liu @ 2020-11-13 20:48 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, netdev,
	Andrii Nakryiko, Shakeel Butt, linux-mm, linux-kernel,
	Kernel Team



> On Nov 13, 2020, at 11:40 AM, Roman Gushchin <guro@fb.com> wrote:
> 
> On Fri, Nov 13, 2020 at 09:46:49AM -0800, Song Liu wrote:
>> 
>> 
>>> On Nov 12, 2020, at 2:15 PM, Roman Gushchin <guro@fb.com> wrote:
>> 
>> [...]
>> 
>>> 
>>> +#ifdef CONFIG_MEMCG_KMEM
>>> +static __always_inline int __bpf_map_update_elem(struct bpf_map *map, void *key,
>>> +						 void *value, u64 flags)
>>> +{
>>> +	struct mem_cgroup *old_memcg;
>>> +	bool in_interrupt;
>>> +	int ret;
>>> +
>>> +	/*
>>> +	 * If update from an interrupt context results in a memory allocation,
>>> +	 * the memory cgroup to charge can't be determined from the context
>>> +	 * of the current task. Instead, we charge the memory cgroup, which
>>> +	 * contained a process created the map.
>>> +	 */
>>> +	in_interrupt = in_interrupt();
>>> +	if (in_interrupt)
>>> +		old_memcg = set_active_memcg(map->memcg);
>> 
>> set_active_memcg() checks in_interrupt() again. Maybe we can introduce another
>> helper to avoid checking it twice? Something like
>> 
>> static inline struct mem_cgroup *
>> set_active_memcg_int(struct mem_cgroup *memcg)
>> {
>>        struct mem_cgroup *old;
>> 
>>        old = this_cpu_read(int_active_memcg);
>>        this_cpu_write(int_active_memcg, memcg);
>>        return old;
>> }
> 
> Yeah, it's a good idea!
> 
> in_interrupt() check is very cheap (like checking some bits in a per-cpu variable),
> so I don't think there will be any measurable difference. So I suggest to implement
> it later as an enhancement on top (maybe in the next merge window), to avoid an another
> delay. Otherwise I'll need to send a patch to mm@, wait for reviews and an inclusion
> into the mm tree, etc). Does it work for you?

Yeah, that works. 

Acked-by: Song Liu <songliubraving@fb.com>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v5 31/34] bpf: eliminate rlimit-based memory accounting for bpf local storage maps
  2020-11-13 19:33     ` Roman Gushchin
@ 2020-11-13 20:53       ` Song Liu
  0 siblings, 0 replies; 27+ messages in thread
From: Song Liu @ 2020-11-13 20:53 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, netdev,
	Andrii Nakryiko, Shakeel Butt, linux-mm, linux-kernel,
	Kernel Team



> On Nov 13, 2020, at 11:33 AM, Roman Gushchin <guro@fb.com> wrote:
> 
> On Fri, Nov 13, 2020 at 10:14:48AM -0800, Song Liu wrote:
>> 
>> 
>>> On Nov 12, 2020, at 2:15 PM, Roman Gushchin <guro@fb.com> wrote:
>>> 
>>> Do not use rlimit-based memory accounting for bpf local storage maps.
>>> It has been replaced with the memcg-based memory accounting.
>>> 
>>> Signed-off-by: Roman Gushchin <guro@fb.com>
>>> ---
>>> kernel/bpf/bpf_local_storage.c | 11 -----------
>>> 1 file changed, 11 deletions(-)
>>> 
>>> diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
>>> index fd4f9ac1d042..3b0da5a04d55 100644
>>> --- a/kernel/bpf/bpf_local_storage.c
>>> +++ b/kernel/bpf/bpf_local_storage.c
>> 
>> Do we need to change/remove mem_charge() and mem_uncharge() in 
>> bpf_local_storage.c? I didn't find that in the set.
> 
> No, those are used for per-socket memory limits (see sk_storage_charge()
> and omem_charge()).

I see. Thanks for the explanation. 

Acked-by: Song Liu <songliubraving@fb.com>


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2020-11-13 20:53 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-12 22:15 [PATCH bpf-next v5 00/34] bpf: switch to memcg-based memory accounting Roman Gushchin
2020-11-12 22:15 ` [PATCH bpf-next v5 05/34] bpf: memcg-based memory accounting for bpf progs Roman Gushchin
2020-11-13 17:31   ` Song Liu
2020-11-12 22:15 ` [PATCH bpf-next v5 06/34] bpf: prepare for memcg-based memory accounting for bpf maps Roman Gushchin
2020-11-13 17:46   ` Song Liu
2020-11-13 19:40     ` Roman Gushchin
2020-11-13 20:48       ` Song Liu
2020-11-12 22:15 ` [PATCH bpf-next v5 07/34] bpf: " Roman Gushchin
2020-11-13 18:04   ` Song Liu
2020-11-12 22:15 ` [PATCH bpf-next v5 15/34] bpf: memcg-based memory accounting for bpf local storage maps Roman Gushchin
2020-11-13 18:07   ` Song Liu
2020-11-12 22:15 ` [PATCH bpf-next v5 31/34] bpf: eliminate rlimit-based " Roman Gushchin
2020-11-13 18:14   ` Song Liu
2020-11-13 19:33     ` Roman Gushchin
2020-11-13 20:53       ` Song Liu
2020-11-12 22:15 ` [PATCH bpf-next v5 32/34] bpf: eliminate rlimit-based memory accounting infra for bpf maps Roman Gushchin
2020-11-13 18:17   ` Song Liu
     [not found] ` <20201112221543.3621014-2-guro@fb.com>
2020-11-12 22:56   ` [PATCH bpf-next v5 01/34] mm: memcontrol: use helpers to read page's memcg data Stephen Rothwell
2020-11-13  0:26     ` Roman Gushchin
2020-11-13  3:04       ` Alexei Starovoitov
2020-11-13  3:18         ` Andrew Morton
2020-11-13  3:25           ` Alexei Starovoitov
2020-11-13  3:40             ` Andrew Morton
2020-11-13  4:08               ` Alexei Starovoitov
2020-11-13  4:01             ` Roman Gushchin
2020-11-13 14:25               ` Shakeel Butt
2020-11-13 17:18                 ` Roman Gushchin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).