Re: [PATCH bpf-next v2 1/2] xsk: store struct xdp_sock as a flexible array member of the XSKMAP

From: "Björn Töpel" <bjorn.topel@gmail.com>
To: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Netdev <netdev@vger.kernel.org>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Björn Töpel" <bjorn.topel@intel.com>, bpf <bpf@vger.kernel.org>,
	"Magnus Karlsson" <magnus.karlsson@gmail.com>,
	"Karlsson, Magnus" <magnus.karlsson@intel.com>,
	"Fijalkowski, Maciej" <maciej.fijalkowski@intel.com>,
	"Toke Høiland-Jørgensen" <toke@redhat.com>
Subject: Re: [PATCH bpf-next v2 1/2] xsk: store struct xdp_sock as a flexible array member of the XSKMAP
Date: Mon, 28 Oct 2019 23:11:50 +0100	[thread overview]
Message-ID: <CAJ+HfNhVZFNV3bZPhhiAd8ObechCJ5CdODM=W1Qf0wdN97TL=w@mail.gmail.com> (raw)
In-Reply-To: <20191028105508.4173bf8b@cakuba.hsd1.ca.comcast.net>

On Mon, 28 Oct 2019 at 18:55, Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> On Fri, 25 Oct 2019 09:18:40 +0200, Björn Töpel wrote:
> > From: Björn Töpel <bjorn.topel@intel.com>
> >
> > Prior this commit, the array storing XDP socket instances were stored
> > in a separate allocated array of the XSKMAP. Now, we store the sockets
> > as a flexible array member in a similar fashion as the arraymap. Doing
> > so, we do less pointer chasing in the lookup.
> >
> > Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
>
> Thanks for the re-spin.
>
> > diff --git a/kernel/bpf/xskmap.c b/kernel/bpf/xskmap.c
> > index 82a1ffe15dfa..a83e92fe2971 100644
> > --- a/kernel/bpf/xskmap.c
> > +++ b/kernel/bpf/xskmap.c
>
> > @@ -92,44 +93,35 @@ static struct bpf_map *xsk_map_alloc(union bpf_attr *attr)
> >           attr->map_flags & ~(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY))
> >               return ERR_PTR(-EINVAL);
> >
> > -     m = kzalloc(sizeof(*m), GFP_USER);
> > -     if (!m)
> > +     numa_node = bpf_map_attr_numa_node(attr);
> > +     size = struct_size(m, xsk_map, attr->max_entries);
> > +     cost = size + array_size(sizeof(*m->flush_list), num_possible_cpus());
>
> Now we didn't use array_size() previously because the sum here may
> overflow.
>
> We could use __ab_c_size() here, the name is probably too ugly to use
> directly and IDK what we'd have to name such a accumulation helper...
>
> So maybe just make cost and size a u64 and we should be in the clear.
>

Hmm, but both:
  int bpf_map_charge_init(struct bpf_map_memory *mem, size_t size);
  void *bpf_map_area_alloc(size_t size, int numa_node);
pass size as size_t, so casting to u64 doesn't really help on 32-bit
systems, right?

Wdyt about simply adding:
  if (cost < size)
    return ERR_PTR(-EINVAL)
after the cost calculation for explicit overflow checking?

So, if size's struct_size overflows, the allocation will fail.
And we'll catch the cost overflow with the if-statement, no?

Another option is changing the size_t in bpf_map_... to u64. Maybe
that's better, since arraymap and devmap uses u64 for cost/size.

Björn

> > +     err = bpf_map_charge_init(&mem, cost);
> > +     if (err < 0)
> > +             return ERR_PTR(err);
> > +
> > +     m = bpf_map_area_alloc(size, numa_node);
> > +     if (!m) {
> > +             bpf_map_charge_finish(&mem);
> >               return ERR_PTR(-ENOMEM);
> > +     }
> >
> >       bpf_map_init_from_attr(&m->map, attr);
> > +     bpf_map_charge_move(&m->map.memory, &mem);
> >       spin_lock_init(&m->lock);
> >
> > -     cost = (u64)m->map.max_entries * sizeof(struct xdp_sock *);
> > -     cost += sizeof(struct list_head) * num_possible_cpus();
> > -
> > -     /* Notice returns -EPERM on if map size is larger than memlock limit */
> > -     err = bpf_map_charge_init(&m->map.memory, cost);
> > -     if (err)
> > -             goto free_m;
> > -
> > -     err = -ENOMEM;
> > -
> >       m->flush_list = alloc_percpu(struct list_head);
> > -     if (!m->flush_list)
> > -             goto free_charge;
> > +     if (!m->flush_list) {
> > +             bpf_map_charge_finish(&m->map.memory);
> > +             bpf_map_area_free(m);
> > +             return ERR_PTR(-ENOMEM);
> > +     }
> >
> >       for_each_possible_cpu(cpu)
> >               INIT_LIST_HEAD(per_cpu_ptr(m->flush_list, cpu));
> >
> > -     m->xsk_map = bpf_map_area_alloc(m->map.max_entries *
> > -                                     sizeof(struct xdp_sock *),
> > -                                     m->map.numa_node);
> > -     if (!m->xsk_map)
> > -             goto free_percpu;
> >       return &m->map;
> > -
> > -free_percpu:
> > -     free_percpu(m->flush_list);
> > -free_charge:
> > -     bpf_map_charge_finish(&m->map.memory);
> > -free_m:
> > -     kfree(m);
> > -     return ERR_PTR(err);
> >  }
> >
> >  static void xsk_map_free(struct bpf_map *map)