All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Andrii Nakryiko <andriin@fb.com>
Cc: bpf@vger.kernel.org, netdev@vger.kernel.org, ast@fb.com,
	daniel@iogearbox.net, andrii.nakryiko@gmail.com,
	kernel-team@fb.com, Johannes Weiner <hannes@cmpxchg.org>,
	Rik van Riel <riel@surriel.com>
Subject: Re: [PATCH v4 bpf-next 2/4] bpf: add mmap() support for BPF_MAP_TYPE_ARRAY
Date: Thu, 14 Nov 2019 20:45:19 -0800	[thread overview]
Message-ID: <20191115044518.sqh3y3bwtjfp5zex@ast-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <20191115040225.2147245-3-andriin@fb.com>

On Thu, Nov 14, 2019 at 08:02:23PM -0800, Andrii Nakryiko wrote:
> Add ability to memory-map contents of BPF array map. This is extremely useful
> for working with BPF global data from userspace programs. It allows to avoid
> typical bpf_map_{lookup,update}_elem operations, improving both performance
> and usability.
> 
> There had to be special considerations for map freezing, to avoid having
> writable memory view into a frozen map. To solve this issue, map freezing and
> mmap-ing is happening under mutex now:
>   - if map is already frozen, no writable mapping is allowed;
>   - if map has writable memory mappings active (accounted in map->writecnt),
>     map freezing will keep failing with -EBUSY;
>   - once number of writable memory mappings drops to zero, map freezing can be
>     performed again.
> 
> Only non-per-CPU plain arrays are supported right now. Maps with spinlocks
> can't be memory mapped either.
> 
> For BPF_F_MMAPABLE array, memory allocation has to be done through vmalloc()
> to be mmap()'able. We also need to make sure that array data memory is
> page-sized and page-aligned, so we over-allocate memory in such a way that
> struct bpf_array is at the end of a single page of memory with array->value
> being aligned with the start of the second page. On deallocation we need to
> accomodate this memory arrangement to free vmalloc()'ed memory correctly.
> 
> One important consideration regarding how memory-mapping subsystem functions.
> Memory-mapping subsystem provides few optional callbacks, among them open()
> and close().  close() is called for each memory region that is unmapped, so
> that users can decrease their reference counters and free up resources, if
> necessary. open() is *almost* symmetrical: it's called for each memory region
> that is being mapped, **except** the very first one. So bpf_map_mmap does
> initial refcnt bump, while open() will do any extra ones after that. Thus
> number of close() calls is equal to number of open() calls plus one more.
> 
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Rik van Riel <riel@surriel.com>
> Acked-by: Song Liu <songliubraving@fb.com>
> Acked-by: John Fastabend <john.fastabend@gmail.com>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---
>  include/linux/bpf.h            | 11 ++--
>  include/linux/vmalloc.h        |  1 +
>  include/uapi/linux/bpf.h       |  3 ++
>  kernel/bpf/arraymap.c          | 59 +++++++++++++++++---
>  kernel/bpf/syscall.c           | 99 ++++++++++++++++++++++++++++++++--
>  mm/vmalloc.c                   | 20 +++++++
>  tools/include/uapi/linux/bpf.h |  3 ++
>  7 files changed, 184 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 6fbe599fb977..8021fce98868 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -12,6 +12,7 @@
>  #include <linux/err.h>
>  #include <linux/rbtree_latch.h>
>  #include <linux/numa.h>
> +#include <linux/mm_types.h>
>  #include <linux/wait.h>
>  #include <linux/u64_stats_sync.h>
>  
> @@ -66,6 +67,7 @@ struct bpf_map_ops {
>  				     u64 *imm, u32 off);
>  	int (*map_direct_value_meta)(const struct bpf_map *map,
>  				     u64 imm, u32 *off);
> +	int (*map_mmap)(struct bpf_map *map, struct vm_area_struct *vma);
>  };
>  
>  struct bpf_map_memory {
> @@ -94,9 +96,10 @@ struct bpf_map {
>  	u32 btf_value_type_id;
>  	struct btf *btf;
>  	struct bpf_map_memory memory;
> +	char name[BPF_OBJ_NAME_LEN];
>  	bool unpriv_array;
> -	bool frozen; /* write-once */
> -	/* 48 bytes hole */
> +	bool frozen; /* write-once; write-protected by freeze_mutex */
> +	/* 22 bytes hole */
>  
>  	/* The 3rd and 4th cacheline with misc members to avoid false sharing
>  	 * particularly with refcounting.
> @@ -104,7 +107,8 @@ struct bpf_map {
>  	atomic64_t refcnt ____cacheline_aligned;
>  	atomic64_t usercnt;
>  	struct work_struct work;
> -	char name[BPF_OBJ_NAME_LEN];
> +	struct mutex freeze_mutex;
> +	u64 writecnt; /* writable mmap cnt; protected by freeze_mutex */
>  };

Can the mutex be moved into bpf_array instead of being in bpf_map that is
shared across all map types?
If so then can you reuse the mutex that Daniel is adding in his patch 6/8
of tail_call series? Not sure what would the right name for such mutex.
It will be used for your map_freeze logic and for Daniel's text_poke.


  reply	other threads:[~2019-11-15  4:45 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-15  4:02 [PATCH v4 bpf-next 0/4] Add support for memory-mapping BPF array maps Andrii Nakryiko
2019-11-15  4:02 ` [PATCH v4 bpf-next 1/4] bpf: switch bpf_map ref counter to 64bit so bpf_map_inc never fails Andrii Nakryiko
2019-11-15 21:47   ` Song Liu
2019-11-15 23:23   ` Daniel Borkmann
2019-11-15 23:27     ` Andrii Nakryiko
2019-11-15  4:02 ` [PATCH v4 bpf-next 2/4] bpf: add mmap() support for BPF_MAP_TYPE_ARRAY Andrii Nakryiko
2019-11-15  4:45   ` Alexei Starovoitov [this message]
2019-11-15  5:05     ` Andrii Nakryiko
2019-11-15  5:08       ` Alexei Starovoitov
2019-11-15  5:43         ` Andrii Nakryiko
2019-11-15 16:36           ` Andrii Nakryiko
2019-11-15 20:42             ` Jakub Kicinski
2019-11-15 13:57   ` Johannes Weiner
2019-11-15 23:31   ` Daniel Borkmann
2019-11-15 23:37     ` Alexei Starovoitov
2019-11-15 23:44       ` Daniel Borkmann
2019-11-15 23:47         ` Alexei Starovoitov
2019-11-16  0:13           ` Daniel Borkmann
2019-11-16  1:18             ` Alexei Starovoitov
2019-11-17  5:57               ` Andrii Nakryiko
2019-11-17 12:07                 ` Daniel Borkmann
2019-11-17 17:17                   ` Andrii Nakryiko
2019-11-18 13:50                     ` Daniel Borkmann
2019-11-15  4:02 ` [PATCH v4 bpf-next 3/4] libbpf: make global data internal arrays mmap()-able, if possible Andrii Nakryiko
2019-11-15  4:02 ` [PATCH v4 bpf-next 4/4] selftests/bpf: add BPF_TYPE_MAP_ARRAY mmap() tests Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191115044518.sqh3y3bwtjfp5zex@ast-mbp.dhcp.thefacebook.com \
    --to=alexei.starovoitov@gmail.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andriin@fb.com \
    --cc=ast@fb.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=netdev@vger.kernel.org \
    --cc=riel@surriel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.