From mboxrd@z Thu Jan  1 00:00:00 1970
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Subject: Re: [PATCH bpf-next] bpf: enable btf for use in all maps
Date: Thu, 9 Aug 2018 14:44:56 -0700
Message-ID: <20180809214430.gd4zwsnmbwmq7b26@ast-mbp>
References: <20180809194220.17484-1-daniel@iogearbox.net>
 <20180809211416.oznmx5jlnbagkk3w@ast-mbp>
 <d851548d-e027-57ee-55c9-a61c48ffcf15@iogearbox.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: ast@kernel.org, netdev@vger.kernel.org, yhs@fb.com
To: Daniel Borkmann <daniel@iogearbox.net>
Return-path: <netdev-owner@vger.kernel.org>
Received: from mail-pg1-f193.google.com ([209.85.215.193]:40060 "EHLO
        mail-pg1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1727461AbeHJALn (ORCPT
        <rfc822;netdev@vger.kernel.org>); Thu, 9 Aug 2018 20:11:43 -0400
Received: by mail-pg1-f193.google.com with SMTP id x5-v6so3351655pgp.7
        for <netdev@vger.kernel.org>; Thu, 09 Aug 2018 14:44:58 -0700 (PDT)
Content-Disposition: inline
In-Reply-To: <d851548d-e027-57ee-55c9-a61c48ffcf15@iogearbox.net>
Sender: netdev-owner@vger.kernel.org
List-ID: <netdev.vger.kernel.org>

On Thu, Aug 09, 2018 at 11:30:52PM +0200, Daniel Borkmann wrote:
> On 08/09/2018 11:14 PM, Alexei Starovoitov wrote:
> > On Thu, Aug 09, 2018 at 09:42:20PM +0200, Daniel Borkmann wrote:
> >> Commit a26ca7c982cb ("bpf: btf: Add pretty print support to
> >> the basic arraymap") enabled support for BTF and dumping via
> >> BPF fs for arraymap. However, both can be decoupled from each
> >> other such that all BPF maps can be supported for attaching
> >> BTF key/value information, while not all maps necessarily
> >> need to dump via map_seq_show_elem() callback.
> >>
> >> The check in array_map_check_btf() can be generalized as
> >> ultimatively the key and value size is the only contraint
> >> that needs to match for the map. The fact that the key needs
> >> to be of type int is optional; it could be any data type as
> >> long as it matches the 4 byte key size, just like hash table
> >> key or others could be of any data type as well.
> >>
> >> Minimal example of a hash table dump which then works out
> >> of the box for bpftool:
> >>
> >>   # bpftool map dump id 19
> >>   [{
> >>           "key": {
> >>               "": {
> >>                   "vip": 0,
> >>                   "vipv6": []
> >>               },
> >>               "port": 0,
> >>               "family": 0,
> >>               "proto": 0
> >>           },
> >>           "value": {
> >>               "flags": 0,
> >>               "vip_num": 0
> >>           }
> >>       }
> >>   ]
> >>
> >> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> >> Cc: Yonghong Song <yhs@fb.com>
> >> ---
> >>  include/linux/bpf.h   |  4 +---
> >>  kernel/bpf/arraymap.c | 27 ---------------------------
> >>  kernel/bpf/inode.c    |  3 ++-
> >>  kernel/bpf/syscall.c  | 24 ++++++++++++++++++++----
> >>  4 files changed, 23 insertions(+), 35 deletions(-)
> >>
> >> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> >> index cd8790d..eb76e8e 100644
> >> --- a/include/linux/bpf.h
> >> +++ b/include/linux/bpf.h
> >> @@ -48,8 +48,6 @@ struct bpf_map_ops {
> >>  	u32 (*map_fd_sys_lookup_elem)(void *ptr);
> >>  	void (*map_seq_show_elem)(struct bpf_map *map, void *key,
> >>  				  struct seq_file *m);
> >> -	int (*map_check_btf)(const struct bpf_map *map, const struct btf *btf,
> >> -			     u32 key_type_id, u32 value_type_id);
> >>  };
> >>  
> >>  struct bpf_map {
> >> @@ -118,7 +116,7 @@ static inline bool bpf_map_offload_neutral(const struct bpf_map *map)
> >>  
> >>  static inline bool bpf_map_support_seq_show(const struct bpf_map *map)
> >>  {
> >> -	return map->ops->map_seq_show_elem && map->ops->map_check_btf;
> >> +	return map->btf && map->ops->map_seq_show_elem;
> >>  }
> >>  
> >>  extern const struct bpf_map_ops bpf_map_offload_ops;
> >> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> >> index 2aa55d030..67f0bdf 100644
> >> --- a/kernel/bpf/arraymap.c
> >> +++ b/kernel/bpf/arraymap.c
> >> @@ -358,32 +358,6 @@ static void array_map_seq_show_elem(struct bpf_map *map, void *key,
> >>  	rcu_read_unlock();
> >>  }
> >>  
> >> -static int array_map_check_btf(const struct bpf_map *map, const struct btf *btf,
> >> -			       u32 btf_key_id, u32 btf_value_id)
> >> -{
> >> -	const struct btf_type *key_type, *value_type;
> >> -	u32 key_size, value_size;
> >> -	u32 int_data;
> >> -
> >> -	key_type = btf_type_id_size(btf, &btf_key_id, &key_size);
> >> -	if (!key_type || BTF_INFO_KIND(key_type->info) != BTF_KIND_INT)
> >> -		return -EINVAL;
> >> -
> >> -	int_data = *(u32 *)(key_type + 1);
> >> -	/* bpf array can only take a u32 key.  This check makes
> >> -	 * sure that the btf matches the attr used during map_create.
> >> -	 */
> >> -	if (BTF_INT_BITS(int_data) != 32 || key_size != 4 ||
> >> -	    BTF_INT_OFFSET(int_data))
> >> -		return -EINVAL;
> > 
> > I think most of these checks are still necessary for array type.
> > Relaxing BTF array key from BTF_KIND_INT to, for example, BTF_KIND_ENUM
> > is probably ok, but key being BTF_KIND_PTR or BTF_KIND_ARRAY doesn't makes sense.
> 
> Hmm, so on 64 bit archs BTF_KIND_PTR would get rejected for array,
> on 32 bit it may be allowed due to sizeof(void *) == 4. BTF_KIND_ARRAY
> could be array of u8 foo[4], for example, or u16 foo[2]. But how would
> it ultimately be different from e.g. having 'struct a' versus 'struct b'
> where both are of same size and while actual key has 'struct a', the one
> who writes the prog resp. loads the BTF into the kernel would lie about
> it stating it's of type 'struct b' instead? It's basically trusting the
> app that it advertised sane key types which kernel is propagating back.

for hash map - yes. the kernel cannot yet catch the lie that
key == 'struct a' that user said in BTF is not what program used
(which used 'struct b' of the same size).
Eventually we will annotate all load/store in the program and will
make sure that memory access match what BTF said.
For array we can catch the lie today that key is not 4 byte int,
since it matters from pretty printing point of view.
If it's PTR or ARRAY or STRUCT, the printer will go nuts.
When userspace can trust kernel that array key is u32 it can print
int arr[10];
just like gdb does:
(gdb) p arr
$1 = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
(gdb) ptype arr
type = int [10]

What user printer suppose to do if kernel says that key=PTR or, worse, key=STRUCT ?
I cannot think of sane way of printing such array.
Even key=ENUM is not trivial to print, but I think it can be useful and
practical to use ENUM as a key, but for now I'd stick to INT only
like the check does today.