From mboxrd@z Thu Jan 1 00:00:00 1970 From: Cong Wang Subject: Re: [PATCH net-next] net: preserve sock reference when scrubbing the skb. Date: Wed, 27 Jun 2018 12:55:37 -0700 Message-ID: References: <20180625155610.30802-1-fbl@redhat.com> <48e15faf-f935-0166-e1db-18f7286e7264@gmail.com> <20180626220300.GT19565@plex.lan> <096ada36-8e05-c330-e5b3-3f6fcc77aea2@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Cc: Flavio Leitner , Linux Kernel Network Developers , Paolo Abeni , David Miller , Florian Westphal , NetFilter To: Eric Dumazet Return-path: Received: from mail-pg0-f67.google.com ([74.125.83.67]:43919 "EHLO mail-pg0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965955AbeF0Tzt (ORCPT ); Wed, 27 Jun 2018 15:55:49 -0400 In-Reply-To: <096ada36-8e05-c330-e5b3-3f6fcc77aea2@gmail.com> Sender: netdev-owner@vger.kernel.org List-ID: On Wed, Jun 27, 2018 at 12:33 PM Eric Dumazet wrote: > > > > On 06/27/2018 11:59 AM, Cong Wang wrote: > > > > > IIRC, this skb_orphan() was introduced much earlier than TSQ, probably > > from the beginning of veth. > > Sigh > > SO_SNDBUF was invented years ago before veth. Yeah, probably when there was only one stack on one host. SO_SNDBUF is aligned to networking stack basis. > > You focus on TSQ while it is only one of the many things that are broken. > I think it is the opposite: this patchset _potentially_ breaks things, not fixes anything. > > > > Leaving the stack should be effectively equivalent to leaving the host, > > from the view of network isolation. > > > > > Having a UDP socket being able to burn a cpu and fill a qdisc is a major bug. > Very true, network isolation never isolates CPU or memory. It is cpuset's job to provide physical CPU isolation, not networking namespace. I don't want to defend this, but it is the current design. > Bu default (blocking send() syscalls) the following loop should > block the thread if socket sk_wmem_alloc hits sk_sndbuf, this is > the beauty of backpressure. > > while (1) > send(fd, ...); > > With skb_orphan(), sk_wmem_alloc will stay around 0, so the loop will burn a cpu > and fill a qdisc, eventually breaking "network isolation", since other sockets > might be unable to send a single packet. Won't the same happen when congestion on a physical connection between two hosts? Does 'host isolation' break too? > > If you have a concrete case where the skb_orphan() is needed, then you will have > to add a parameter to let the admin opt-in for this. Please see the other reply from me, where I list 3 or 4 reasons.