From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andreas Hartmann Subject: Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected Date: Mon, 18 Dec 2017 18:11:06 +0100 Message-ID: <6f75bdf5-839b-8c84-c8be-e83d071b245e@maya.org> References: <9615150a-eb78-2f9d-798f-6aa460932aec@01019freenet.de> <4efbaf24-f419-2c8e-c705-59a5242b0575@01019freenet.de> <881560f8-54ec-e946-50cb-b2e80ddb5f97@01019freenet.de> <73b7a7b0-4264-2bd0-9e65-69841377f09f@redhat.com> <401a0715-fd28-63a3-8dfd-e89835d70db0@01019freenet.de> <11c25b88-af9b-a1f7-b5f5-0420c75916d7@01019freenet.de> <20171208084751.tom4auppogz4lanz@unicorn.suse.cz> <20171208114025.kjcaratqcveq7zu5@unicorn.suse.cz> <96a16c1f-c026-f506-78c1-dad88471361d@01019freenet.de> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Cc: Michal Kubecek , Jason Wang , David Miller , Network Development To: Willem de Bruijn Return-path: Received: from mout2.freenet.de ([195.4.92.92]:44572 "EHLO mout2.freenet.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935425AbdLRROS (ORCPT ); Mon, 18 Dec 2017 12:14:18 -0500 In-Reply-To: Content-Language: en-US Sender: netdev-owner@vger.kernel.org List-ID: On 12/17/2017 at 11:33 PM Willem de Bruijn wrote: > On Fri, Dec 15, 2017 at 1:05 AM, Andreas Hartmann > wrote: >> On 12/14/2017 at 11:17 PM Willem de Bruijn wrote: >>>>> Well, the patch does not fix hanging VMs, which have been shutdown and >>>>> can't be killed any more. >>>>> Because of the stack trace >>>>> >>>>> [] vhost_net_ubuf_put_and_wait+0x35/0x60 [vhost_net] >>>>> [] vhost_net_ioctl+0x304/0x870 [vhost_net] >>>>> [] do_vfs_ioctl+0x8f/0x5c0 >>>>> [] SyS_ioctl+0x74/0x80 >>>>> [] do_syscall_64+0x5b/0x100 >>>>> [] entry_SYSCALL64_slow_path+0x25/0x25 >>>>> [] 0xffffffffffffffff >>>>> >>>>> I was hoping, that the problems could be related - but that seems not to >>>>> be true. >>>> >>>> However, it turned out, that reverting the complete patchset "Remove UDP >>>> Fragmentation Offload support" prevent hanging qemu processes. >>> >>> That implies a combination of UFO and vhost zerocopy. Disabling >>> experimental_zcopytx in vhost_net will probably work around the bug >>> then. > > I have been able to reproduce the hang by sending a UFO packet > between two guests running v4.13 on a host running v4.15-rc1. > > The vhost_net_ubuf_ref refcount indeed hits overflow (-1) from > vhost_zerocopy_callback being called for each segment of a > segmented UFO skb. This refcount is decremented then on each > segment, but incremented only once for the entire UFO skb. > > Before v4.14, these packets would be converted in skb_segment to > regular copy packets with skb_orphan_frags and the callback function > called once at this point. v4.14 added support for reference counted > zerocopy skb that can pass through skb_orphan_frags unmodified and > have their zerocopy state safely cloned with skb_zerocopy_clone. > > The call to skb_zerocopy_clone must come after skb_orphan_frags > to limit cloning of this state to those skbs that can do so safely. > > Please try a host with the following patch. This fixes it for me. I intend to > send it to net. > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index a592ca025fc4..d2d985418819 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -3654,8 +3654,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, > > skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags & > SKBTX_SHARED_FRAG; > - if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC)) > - goto err; > > while (pos < offset + len) { > if (i >= nfrags) { > @@ -3681,6 +3679,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, > > if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC))) > goto err; > + if (skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC)) > + goto err; > > *nskb_frag = *frag; > __skb_frag_ref(nskb_frag); > > > This is relatively inefficient, as it calls skb_zerocopy_clone for each frag > in the frags[] array. I will follow-up with a patch to net-next that only > checks once per skb: > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 466581cf4cdc..a293a33604ec 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -3662,7 +3662,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, > > skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags & > SKBTX_SHARED_FRAG; > - if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC)) > + if (skb_orphan_frags(frag_skb, GFP_ATOMIC) || > + skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC)) > goto err; > > while (pos < offset + len) { > @@ -3676,6 +3677,11 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, > > BUG_ON(!nfrags); > > + if (skb_orphan_frags(frag_skb, GFP_ATOMIC) || > + skb_zerocopy_clone(nskb, frag_skb, > + GFP_ATOMIC)) > + goto err; > + > list_skb = list_skb->next; > } > > @@ -3687,9 +3693,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, > goto err; > } > > - if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC))) > - goto err; > - I'm currently testing this one. > > I'll also send to net-next > > (1) a patch to convert its vhost_net_ ubuf_ref refcnt to refcount_t > > (2) a path to skb_zerocopy_clone to warn on clone if not > sock_zerocopy_callback > >> I already tested it w/ options vhost_net experimental_zcopytx=0 - but >> this didn't "resolve" anything. See >> https://www.mail-archive.com/netdev@vger.kernel.org/msg203197.html >> >> Therefore, I think your following thoughts are lapsed unfortunately, >> aren't they? > > That experiment was perhaps run before commit 0c19f846d582 ("net: > accept UFO datagrams from tuntap and packet") and hit the other UFO > bug. That's probably true. Thanks, Andreas