From mboxrd@z Thu Jan 1 00:00:00 1970 From: Willem de Bruijn Subject: Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected Date: Thu, 21 Dec 2017 12:11:08 -0500 Message-ID: References: <9615150a-eb78-2f9d-798f-6aa460932aec@01019freenet.de> <401a0715-fd28-63a3-8dfd-e89835d70db0@01019freenet.de> <11c25b88-af9b-a1f7-b5f5-0420c75916d7@01019freenet.de> <20171208084751.tom4auppogz4lanz@unicorn.suse.cz> <20171208114025.kjcaratqcveq7zu5@unicorn.suse.cz> <96a16c1f-c026-f506-78c1-dad88471361d@01019freenet.de> <6f75bdf5-839b-8c84-c8be-e83d071b245e@maya.org> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Cc: Michal Kubecek , Jason Wang , David Miller , Network Development To: Andreas Hartmann Return-path: Received: from mail-ot0-f177.google.com ([74.125.82.177]:42655 "EHLO mail-ot0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751766AbdLURLt (ORCPT ); Thu, 21 Dec 2017 12:11:49 -0500 Received: by mail-ot0-f177.google.com with SMTP id i1so23011107oth.9 for ; Thu, 21 Dec 2017 09:11:49 -0800 (PST) In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: On Thu, Dec 21, 2017 at 12:05 PM, Andreas Hartmann wrote: > On 12/20/2017 at 11:44 PM Willem de Bruijn wrote: >> >> On Wed, Dec 20, 2017 at 10:56 AM, Andreas Hartmann >> wrote: >>> >>> On 12/18/2017 at 06:11 PM Andreas Hartmann wrote: >>>> >>>> On 12/17/2017 at 11:33 PM Willem de Bruijn wrote: >>> >>> [...] >>>>> >>>>> I have been able to reproduce the hang by sending a UFO packet >>>>> between two guests running v4.13 on a host running v4.15-rc1. >>>>> >>>>> The vhost_net_ubuf_ref refcount indeed hits overflow (-1) from >>>>> vhost_zerocopy_callback being called for each segment of a >>>>> segmented UFO skb. This refcount is decremented then on each >>>>> segment, but incremented only once for the entire UFO skb. >>>>> >>>>> Before v4.14, these packets would be converted in skb_segment to >>>>> regular copy packets with skb_orphan_frags and the callback function >>>>> called once at this point. v4.14 added support for reference counted >>>>> zerocopy skb that can pass through skb_orphan_frags unmodified and >>>>> have their zerocopy state safely cloned with skb_zerocopy_clone. >>>>> >>>>> The call to skb_zerocopy_clone must come after skb_orphan_frags >>>>> to limit cloning of this state to those skbs that can do so safely. >>>>> >>>>> Please try a host with the following patch. This fixes it for me. I >>>>> intend to >>>>> send it to net. >>>>> >>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c >>>>> index a592ca025fc4..d2d985418819 100644 >>>>> --- a/net/core/skbuff.c >>>>> +++ b/net/core/skbuff.c >>>>> @@ -3654,8 +3654,6 @@ struct sk_buff *skb_segment(struct sk_buff >>>>> *head_skb, >>>>> >>>>> skb_shinfo(nskb)->tx_flags |= >>>>> skb_shinfo(head_skb)->tx_flags & >>>>> SKBTX_SHARED_FRAG; >>>>> - if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC)) >>>>> - goto err; >>>>> >>>>> while (pos < offset + len) { >>>>> if (i >= nfrags) { >>>>> @@ -3681,6 +3679,8 @@ struct sk_buff *skb_segment(struct sk_buff >>>>> *head_skb, >>>>> >>>>> if (unlikely(skb_orphan_frags(frag_skb, >>>>> GFP_ATOMIC))) >>>>> goto err; >>>>> + if (skb_zerocopy_clone(nskb, frag_skb, >>>>> GFP_ATOMIC)) >>>>> + goto err; >>>>> >>>>> *nskb_frag = *frag; >>>>> __skb_frag_ref(nskb_frag); >>>>> >>>>> >>>>> This is relatively inefficient, as it calls skb_zerocopy_clone for each >>>>> frag >>>>> in the frags[] array. I will follow-up with a patch to net-next that >>>>> only >>>>> checks once per skb: >>>>> >>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c >>>>> index 466581cf4cdc..a293a33604ec 100644 >>>>> --- a/net/core/skbuff.c >>>>> +++ b/net/core/skbuff.c >>>>> @@ -3662,7 +3662,8 @@ struct sk_buff *skb_segment(struct sk_buff >>>>> *head_skb, >>>>> >>>>> skb_shinfo(nskb)->tx_flags |= >>>>> skb_shinfo(head_skb)->tx_flags & >>>>> SKBTX_SHARED_FRAG; >>>>> - if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC)) >>>>> + if (skb_orphan_frags(frag_skb, GFP_ATOMIC) || >>>>> + skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC)) >>>>> goto err; >>>>> >>>>> while (pos < offset + len) { >>>>> @@ -3676,6 +3677,11 @@ struct sk_buff *skb_segment(struct sk_buff >>>>> *head_skb, >>>>> >>>>> BUG_ON(!nfrags); >>>>> >>>>> + if (skb_orphan_frags(frag_skb, >>>>> GFP_ATOMIC) || >>>>> + skb_zerocopy_clone(nskb, frag_skb, >>>>> + GFP_ATOMIC)) >>>>> + goto err; >>>>> + >>>>> list_skb = list_skb->next; >>>>> } >>>>> >>>>> @@ -3687,9 +3693,6 @@ struct sk_buff *skb_segment(struct sk_buff >>>>> *head_skb, >>>>> goto err; >>>>> } >>>>> >>>>> - if (unlikely(skb_orphan_frags(frag_skb, >>>>> GFP_ATOMIC))) >>>>> - goto err; >>>>> - >>>> >>>> >>>> I'm currently testing this one. >>>> >>> >>> Test is in progress. I'm testing w/ 4.14.7, which already contains "net: >>> accept UFO datagrams from tuntap and packet". >>> >>> At first, I tested an unpatched 4.14.7 - the problem (no more killable >>> qemu-process) did occur promptly on shutdown of the machine. This was >>> expected. >>> >>> Next, I applied the above patch (the second one). Until now, I didn't >>> face any problem any more on shutdown of VMs. Looks promising. >> >> >> Thanks for testing. >> >> I sent the first, simpler, one to net together with another fix. >> >> http://patchwork.ozlabs.org/patch/851715/ >> > > If I'm using the second patch above (the more efficient one and not > "[net,1/2] skbuff: orphan frags before zerocopy clone"), which I'm already > testing here: Is it still necessary to apply this patch "[net,2/2] skbuff: > skb_copy_ubufs must release uarg even without user frags"? Not for this issue. It is an unrelated bug and not triggered by virtio_net as configured normally.