From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66FB8C43381 for ; Sat, 2 Mar 2019 01:08:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2ABFE2083E for ; Sat, 2 Mar 2019 01:08:43 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=netronome-com.20150623.gappssmtp.com header.i=@netronome-com.20150623.gappssmtp.com header.b="Vfk5Xv5q" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727125AbfCBBIl (ORCPT ); Fri, 1 Mar 2019 20:08:41 -0500 Received: from mail-qt1-f195.google.com ([209.85.160.195]:33323 "EHLO mail-qt1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725996AbfCBBIl (ORCPT ); Fri, 1 Mar 2019 20:08:41 -0500 Received: by mail-qt1-f195.google.com with SMTP id z39so30087654qtz.0 for ; Fri, 01 Mar 2019 17:08:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=netronome-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:in-reply-to:references :organization:mime-version:content-transfer-encoding; bh=7ZciXvwOJvrm+uZ4YyEDhRFwotkU8qCJRwIylNBFoGc=; b=Vfk5Xv5qgvsRRBHsEU7ZE3ty8n/OlvtmFReNo+6cwWhKo/xERcPUm45BpfihsFohpe XScWt4qRjxNLHAM4YF0aZ0CLPoAVjIZaibRM8E+IFf4+yP4BlC13LekhYytu6MEa6X0+ F6BsDHgNLq55TNOyIqhfstW3xt6JZwXsu/TRK04e0UPuIvcncjsmu3TUfzmwk3KmaE3W 1LXuZCTskZ3UNKsKuRdknWf8U5DSv7zj4cmFX6EeaI00QdXa0VY0PSlmFYazRQg4/dPR 0obCnCTOytNlPF9F4Y7TDDBY4tBa4tPKFeaOg0+FKkUylZGDsTfZl8/Yqpbz+V9e7pJ4 ZoCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:organization:mime-version:content-transfer-encoding; bh=7ZciXvwOJvrm+uZ4YyEDhRFwotkU8qCJRwIylNBFoGc=; b=emI0KzCuUpc4++JXj/2ZNz9L7tOLRjwFmpDhTftvFf8BKd+1p5ygqcaAwKg5H/666D cCplxe9rvTTfRAZoVMDfVYd2jHhlkGQfOsLp09eGBml8Tz0+6WJQHG5MgB+iBHzCSl9S u60s5bDXmIIH1ocQGZrhFB1xjM3bZMcV+ImOoDIghaTQwX0ub/1KVwidTgiAZQuDb2Fa 2K/LSZmx5bW0G4ul5/wSWMkgQT1OPc3Vie87EBaCRM3c+ZXZCZhWb6zG9AVBrL4UvQeg 43J+pHupYx1R6Z9MhjroBk8aPmJIxBsXH215S/OQb45ROf2PKhqjHB87vK/tujJVVQyg p5Rw== X-Gm-Message-State: APjAAAXZKnO7HA10RCzO7P32qmCislJ2XQdFwESKKp7akgP1s5dUQyjo I/yyPx6uCy52kcNCw9F6c4bC8g== X-Google-Smtp-Source: APXvYqw1+opxpSMkZcJrMvHhnkvuG7ZrWg72WQHZ9x/hTE5rFdpQDy99ZwgPyaGz7yq3jc8lEie8YQ== X-Received: by 2002:a0c:d783:: with SMTP id z3mr2961551qvi.190.1551488919837; Fri, 01 Mar 2019 17:08:39 -0800 (PST) Received: from cakuba.netronome.com ([66.60.152.14]) by smtp.gmail.com with ESMTPSA id l36sm1186940qte.82.2019.03.01.17.08.38 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 01 Mar 2019 17:08:39 -0800 (PST) Date: Fri, 1 Mar 2019 17:08:31 -0800 From: Jakub Kicinski To: Toke =?UTF-8?B?SMO4aWxhbmQtSsO4cmdlbnNlbg==?= Cc: David Miller , netdev@vger.kernel.org, Jesper Dangaard Brouer , Daniel Borkmann , Alexei Starovoitov Subject: Re: [PATCH net-next v3 1/3] xdp: Refactor devmap code in preparation for subsequent additions Message-ID: <20190301170831.6ae29baa@cakuba.netronome.com> In-Reply-To: <155144955040.28287.1075106871059918653.stgit@alrua-x1> References: <155144955030.28287.14029975169967438162.stgit@alrua-x1> <155144955040.28287.1075106871059918653.stgit@alrua-x1> Organization: Netronome Systems, Ltd. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Fri, 01 Mar 2019 15:12:30 +0100, Toke H=C3=B8iland-J=C3=B8rgensen wrote: > The subsequent commits introducing default maps and a hash-based ifindex > devmap require a bit of refactoring of the devmap code. Perform this first > so the subsequent commits become easier to read. >=20 > Signed-off-by: Toke H=C3=B8iland-J=C3=B8rgensen > diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c > index 191b79948424..1037fc08c504 100644 > --- a/kernel/bpf/devmap.c > +++ b/kernel/bpf/devmap.c > @@ -75,6 +75,7 @@ struct bpf_dtab { > struct bpf_dtab_netdev **netdev_map; > unsigned long __percpu *flush_needed; > struct list_head list; > + struct rcu_head rcu; I think the RCU change may warrant a separate commit or at least a mention in the commit message ;) > }; > =20 > static DEFINE_SPINLOCK(dev_map_lock); > @@ -85,23 +86,11 @@ static u64 dev_map_bitmap_size(const union bpf_attr *= attr) > return BITS_TO_LONGS((u64) attr->max_entries) * sizeof(unsigned long); > } > =20 > -static struct bpf_map *dev_map_alloc(union bpf_attr *attr) > +static int dev_map_init_map(struct bpf_dtab *dtab, union bpf_attr *attr, > + bool check_memlock) > { > - struct bpf_dtab *dtab; > - int err =3D -EINVAL; > u64 cost; > - > - if (!capable(CAP_NET_ADMIN)) > - return ERR_PTR(-EPERM); > - > - /* check sanity of attributes */ > - if (attr->max_entries =3D=3D 0 || attr->key_size !=3D 4 || > - attr->value_size !=3D 4 || attr->map_flags & ~DEV_CREATE_FLAG_MASK) > - return ERR_PTR(-EINVAL); perhaps consider moving these to a map_alloc_check callback? > - dtab =3D kzalloc(sizeof(*dtab), GFP_USER); > - if (!dtab) > - return ERR_PTR(-ENOMEM); > + int err; > =20 > bpf_map_init_from_attr(&dtab->map, attr); > =20 > @@ -109,60 +98,72 @@ static struct bpf_map *dev_map_alloc(union bpf_attr = *attr) > cost =3D (u64) dtab->map.max_entries * sizeof(struct bpf_dtab_netdev *); > cost +=3D dev_map_bitmap_size(attr) * num_possible_cpus(); > if (cost >=3D U32_MAX - PAGE_SIZE) > - goto free_dtab; > + return -EINVAL; > =20 > dtab->map.pages =3D round_up(cost, PAGE_SIZE) >> PAGE_SHIFT; > =20 > - /* if map size is larger than memlock limit, reject it early */ > - err =3D bpf_map_precharge_memlock(dtab->map.pages); > - if (err) > - goto free_dtab; > - > - err =3D -ENOMEM; > + if (check_memlock) { > + /* if map size is larger than memlock limit, reject it early */ > + err =3D bpf_map_precharge_memlock(dtab->map.pages); > + if (err) > + return -EINVAL; > + } > =20 > /* A per cpu bitfield with a bit per possible net device */ > dtab->flush_needed =3D __alloc_percpu_gfp(dev_map_bitmap_size(attr), > __alignof__(unsigned long), > GFP_KERNEL | __GFP_NOWARN); > if (!dtab->flush_needed) > - goto free_dtab; > + goto err_alloc; > =20 > dtab->netdev_map =3D bpf_map_area_alloc(dtab->map.max_entries * > sizeof(struct bpf_dtab_netdev *), > dtab->map.numa_node); > if (!dtab->netdev_map) > - goto free_dtab; > + goto err_map; > =20 > - spin_lock(&dev_map_lock); > - list_add_tail_rcu(&dtab->list, &dev_map_list); > - spin_unlock(&dev_map_lock); > + return 0; > =20 > - return &dtab->map; > -free_dtab: > + err_map: The label should name the first action. So it was correct, you're making it less good :) Also why the space? > free_percpu(dtab->flush_needed); > - kfree(dtab); > - return ERR_PTR(err); > + err_alloc: and no need for this one > + return -ENOMEM; > } > =20 > -static void dev_map_free(struct bpf_map *map) > +static struct bpf_map *dev_map_alloc(union bpf_attr *attr) > { > - struct bpf_dtab *dtab =3D container_of(map, struct bpf_dtab, map); > - int i, cpu; > + struct bpf_dtab *dtab; > + int err; > =20 > - /* At this point bpf_prog->aux->refcnt =3D=3D 0 and this map->refcnt = =3D=3D 0, > - * so the programs (can be more than one that used this map) were > - * disconnected from events. Wait for outstanding critical sections in > - * these programs to complete. The rcu critical section only guarantees > - * no further reads against netdev_map. It does __not__ ensure pending > - * flush operations (if any) are complete. > - */ > + if (!capable(CAP_NET_ADMIN)) > + return ERR_PTR(-EPERM); > + > + /* check sanity of attributes */ > + if (attr->max_entries =3D=3D 0 || attr->key_size !=3D 4 || > + attr->value_size !=3D 4 || attr->map_flags & ~DEV_CREATE_FLAG_MASK) > + return ERR_PTR(-EINVAL); > + > + dtab =3D kzalloc(sizeof(*dtab), GFP_USER); > + if (!dtab) > + return ERR_PTR(-ENOMEM); > + > + err =3D dev_map_init_map(dtab, attr, true); > + if (err) { > + kfree(dtab); > + return ERR_PTR(err); > + } > =20 > spin_lock(&dev_map_lock); > - list_del_rcu(&dtab->list); > + list_add_tail_rcu(&dtab->list, &dev_map_list); > spin_unlock(&dev_map_lock); > =20 > - bpf_clear_redirect_map(map); > - synchronize_rcu(); > + return &dtab->map; > +} > + > +static void __dev_map_free(struct rcu_head *rcu) > +{ > + struct bpf_dtab *dtab =3D container_of(rcu, struct bpf_dtab, rcu); > + int i, cpu; > =20 > /* To ensure all pending flush operations have completed wait for flush > * bitmap to indicate all flush_needed bits to be zero on _all_ cpus. > @@ -192,6 +193,26 @@ static void dev_map_free(struct bpf_map *map) There is a cond_resched() here, I don't think you can call cond_resched() from an RCU callback. > kfree(dtab); > } > =20 > +static void dev_map_free(struct bpf_map *map) > +{ > + struct bpf_dtab *dtab =3D container_of(map, struct bpf_dtab, map); > + > + /* At this point bpf_prog->aux->refcnt =3D=3D 0 and this map->refcnt = =3D=3D 0, > + * so the programs (can be more than one that used this map) were > + * disconnected from events. Wait for outstanding critical sections in > + * these programs to complete. The rcu critical section only guarantees > + * no further reads against netdev_map. It does __not__ ensure pending > + * flush operations (if any) are complete. > + */ > + > + spin_lock(&dev_map_lock); > + list_del_rcu(&dtab->list); > + spin_unlock(&dev_map_lock); > + > + bpf_clear_redirect_map(map); > + call_rcu(&dtab->rcu, __dev_map_free); > +} > + > static int dev_map_get_next_key(struct bpf_map *map, void *key, void *ne= xt_key) > { > struct bpf_dtab *dtab =3D container_of(map, struct bpf_dtab, map);