From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?B?QmrDtnJuIFTDtnBlbA==?= Subject: Re: [RFC PATCH 07/14] packet: wire up zerocopy for AF_PACKET V4 Date: Fri, 3 Nov 2017 11:47:24 +0100 Message-ID: References: <20171031124145.9667-1-bjorn.topel@gmail.com> <20171031124145.9667-8-bjorn.topel@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Cc: "Karlsson, Magnus" , Alexander Duyck , Alexander Duyck , John Fastabend , Alexei Starovoitov , Jesper Dangaard Brouer , michael.lundkvist@ericsson.com, ravineet.singh@ericsson.com, Daniel Borkmann , Network Development , =?UTF-8?B?QmrDtnJuIFTDtnBlbA==?= , jesse.brandeburg@intel.com, anjali.singhai@intel.com, rami.rosen@intel.com, jeffrey.b.shaw@intel.com, ferruh.yigit@intel.com, qi.z.zhang@intel.com To: Willem de Bruijn Return-path: Received: from mail-qt0-f194.google.com ([209.85.216.194]:54250 "EHLO mail-qt0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750826AbdKCKr0 (ORCPT ); Fri, 3 Nov 2017 06:47:26 -0400 Received: by mail-qt0-f194.google.com with SMTP id n61so2636333qte.10 for ; Fri, 03 Nov 2017 03:47:25 -0700 (PDT) In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: 2017-11-03 4:17 GMT+01:00 Willem de Bruijn : > On Tue, Oct 31, 2017 at 9:41 PM, Bj=C3=B6rn T=C3=B6pel wrote: >> From: Bj=C3=B6rn T=C3=B6pel >> >> This commits adds support for zerocopy mode. Note that zerocopy mode >> requires that the network interface has been bound to the socket using >> the bind syscall, and that the corresponding netdev implements the >> AF_PACKET V4 ndos. >> >> Signed-off-by: Bj=C3=B6rn T=C3=B6pel >> --- >> + >> +static void packet_v4_disable_zerocopy(struct net_device *dev, >> + struct tp4_netdev_parms *zc) >> +{ >> + struct tp4_netdev_parms params; >> + >> + params =3D *zc; >> + params.command =3D TP4_DISABLE; >> + >> + (void)dev->netdev_ops->ndo_tp4_zerocopy(dev, ¶ms); > > Don't ignore error return codes. > Will fix! >> +static int packet_v4_zerocopy(struct sock *sk, int qp) >> +{ >> + struct packet_sock *po =3D pkt_sk(sk); >> + struct socket *sock =3D sk->sk_socket; >> + struct tp4_netdev_parms *zc =3D NULL; >> + struct net_device *dev; >> + bool if_up; >> + int ret =3D 0; >> + >> + /* Currently, only RAW sockets are supported.*/ >> + if (sock->type !=3D SOCK_RAW) >> + return -EINVAL; >> + >> + rtnl_lock(); >> + dev =3D packet_cached_dev_get(po); >> + >> + /* Socket needs to be bound to an interface. */ >> + if (!dev) { >> + rtnl_unlock(); >> + return -EISCONN; >> + } >> + >> + /* The device needs to have both the NDOs implemented. */ >> + if (!(dev->netdev_ops->ndo_tp4_zerocopy && >> + dev->netdev_ops->ndo_tp4_xmit)) { >> + ret =3D -EOPNOTSUPP; >> + goto out_unlock; >> + } > > Inconsistent error handling with above test. > Will fix. >> + >> + if (!(po->rx_ring.pg_vec && po->tx_ring.pg_vec)) { >> + ret =3D -EOPNOTSUPP; >> + goto out_unlock; >> + } > > A ring can be unmapped later with packet_set_ring. Should that operation > fail if zerocopy is enabled? After that, it can also change version with > PACKET_VERSION. > You're correct, I've missed this. I need to revisit the scenario when a ring is unmapped, and recreated. Thanks for pointing this out. >> + >> + if_up =3D dev->flags & IFF_UP; >> + zc =3D rtnl_dereference(po->zc); >> + >> + /* Disable */ >> + if (qp <=3D 0) { >> + if (!zc) >> + goto out_unlock; >> + >> + packet_v4_disable_zerocopy(dev, zc); >> + rcu_assign_pointer(po->zc, NULL); >> + >> + if (if_up) { >> + spin_lock(&po->bind_lock); >> + register_prot_hook(sk); >> + spin_unlock(&po->bind_lock); >> + } > > There have been a bunch of race conditions in this bind code. We need > to be very careful with adding more states to the locking, especially whe= n > open coding in multiple locations, as this patch does. I counted at least > four bind locations. See for instance also > http://patchwork.ozlabs.org/patch/813945/ > Yeah, the locking schemes in AF_PACKET is pretty convoluted. I'll document and make the locking more explicit (and avoiding open coding it). > >> + >> + goto out_unlock; >> + } >> + >> + /* Enable */ >> + if (!zc) { >> + zc =3D kzalloc(sizeof(*zc), GFP_KERNEL); >> + if (!zc) { >> + ret =3D -ENOMEM; >> + goto out_unlock; >> + } >> + } >> + >> + if (zc->queue_pair >=3D 0) >> + packet_v4_disable_zerocopy(dev, zc); > > This calls disable even if zc was freshly allocated. > Shoud be > 0? > Good catch. It should be > 0. >> static int packet_release(struct socket *sock) >> { >> + struct tp4_netdev_parms *zc; >> struct sock *sk =3D sock->sk; >> + struct net_device *dev; >> struct packet_sock *po; >> struct packet_fanout *f; >> struct net *net; >> @@ -3337,6 +3541,20 @@ static int packet_release(struct socket *sock) >> sock_prot_inuse_add(net, sk->sk_prot, -1); >> preempt_enable(); >> >> + rtnl_lock(); >> + zc =3D rtnl_dereference(po->zc); >> + dev =3D packet_cached_dev_get(po); >> + if (zc && dev) >> + packet_v4_disable_zerocopy(dev, zc); >> + if (dev) >> + dev_put(dev); >> + rtnl_unlock(); >> + >> + if (zc) { >> + synchronize_rcu(); >> + kfree(zc); >> + } > > Please use a helper function for anything this complex. Will fix. Thanks, Bj=C3=B6rn