All of lore.kernel.org
 help / color / mirror / Atom feed
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Andreas Hartmann <andihartmann@01019freenet.de>
Cc: Michal Kubecek <mkubecek@suse.cz>,
	Jason Wang <jasowang@redhat.com>,
	David Miller <davem@davemloft.net>,
	Network Development <netdev@vger.kernel.org>
Subject: Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
Date: Sun, 17 Dec 2017 17:33:36 -0500	[thread overview]
Message-ID: <CAF=yD-LWyCD4Y0aJ9O0e_CHLR+3JOeKicRRTEVCPxgw4XOcqGQ@mail.gmail.com> (raw)
In-Reply-To: <c2044fb8-a2ce-241f-b1ce-054ac70a327d@01019freenet.de>

On Fri, Dec 15, 2017 at 1:05 AM, Andreas Hartmann
<andihartmann@01019freenet.de> wrote:
> On 12/14/2017 at 11:17 PM Willem de Bruijn wrote:
>>>> Well, the patch does not fix hanging VMs, which have been shutdown and
>>>> can't be killed any more.
>>>> Because of the stack trace
>>>>
>>>> [<ffffffffc0d0e3c5>] vhost_net_ubuf_put_and_wait+0x35/0x60 [vhost_net]
>>>> [<ffffffffc0d0f264>] vhost_net_ioctl+0x304/0x870 [vhost_net]
>>>> [<ffffffff9b25460f>] do_vfs_ioctl+0x8f/0x5c0
>>>> [<ffffffff9b254bb4>] SyS_ioctl+0x74/0x80
>>>> [<ffffffff9b00365b>] do_syscall_64+0x5b/0x100
>>>> [<ffffffff9b78e7ab>] entry_SYSCALL64_slow_path+0x25/0x25
>>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>>
>>>> I was hoping, that the problems could be related - but that seems not to
>>>> be true.
>>>
>>> However, it turned out, that reverting the complete patchset "Remove UDP
>>> Fragmentation Offload support" prevent hanging qemu processes.
>>
>> That implies a combination of UFO and vhost zerocopy. Disabling
>> experimental_zcopytx in vhost_net will probably work around the bug
>> then.

I have been able to reproduce the hang by sending a UFO packet
between two guests running v4.13 on a host running v4.15-rc1.

The vhost_net_ubuf_ref refcount indeed hits overflow (-1) from
vhost_zerocopy_callback being called for each segment of a
segmented UFO skb. This refcount is decremented then on each
segment, but incremented only once for the entire UFO skb.

Before v4.14, these packets would be converted in skb_segment to
regular copy packets with skb_orphan_frags and the callback function
called once at this point. v4.14 added support for reference counted
zerocopy skb that can pass through skb_orphan_frags unmodified and
have their zerocopy state safely cloned with skb_zerocopy_clone.

The call to skb_zerocopy_clone must come after skb_orphan_frags
to limit cloning of this state to those skbs that can do so safely.

Please try a host with the following patch. This fixes it for me. I intend to
send it to net.

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a592ca025fc4..d2d985418819 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3654,8 +3654,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,

                skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
                                              SKBTX_SHARED_FRAG;
-               if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
-                       goto err;

                while (pos < offset + len) {
                        if (i >= nfrags) {
@@ -3681,6 +3679,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,

                        if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
                                goto err;
+                       if (skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
+                               goto err;

                        *nskb_frag = *frag;
                        __skb_frag_ref(nskb_frag);


This is relatively inefficient, as it calls skb_zerocopy_clone for each frag
in the frags[] array. I will follow-up with a patch to net-next that only
checks once per skb:

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 466581cf4cdc..a293a33604ec 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3662,7 +3662,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,

                skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
                                              SKBTX_SHARED_FRAG;
-               if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
+               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
+                   skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
                        goto err;

                while (pos < offset + len) {
@@ -3676,6 +3677,11 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,

                                BUG_ON(!nfrags);

+                               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
+                                   skb_zerocopy_clone(nskb, frag_skb,
+                                                      GFP_ATOMIC))
+                                       goto err;
+
                                list_skb = list_skb->next;
                        }

@@ -3687,9 +3693,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
                                goto err;
                        }

-                       if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
-                               goto err;
-

I'll also send to net-next

(1) a patch to convert its vhost_net_ ubuf_ref refcnt to refcount_t

(2) a path to skb_zerocopy_clone to warn on clone if not
     sock_zerocopy_callback

> I already tested it w/ options vhost_net experimental_zcopytx=0 - but
> this didn't "resolve" anything. See
> https://www.mail-archive.com/netdev@vger.kernel.org/msg203197.html
>
> Therefore, I think your following thoughts are lapsed unfortunately,
> aren't they?

That experiment was perhaps run before commit 0c19f846d582 ("net:
accept UFO datagrams from tuntap and packet") and hit the other UFO
bug.

  reply	other threads:[~2017-12-17 22:34 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-26 14:17 Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected Andreas Hartmann
2017-11-27 16:46 ` Andreas Hartmann
2017-11-27 16:55   ` Michal Kubecek
2017-11-27 19:09     ` Andreas Hartmann
2017-12-01 10:11 ` Andreas Hartmann
2017-12-03 11:35   ` Andreas Hartmann
2017-12-04 16:28     ` Andreas Hartmann
2017-12-05  3:50       ` Jason Wang
2017-12-05 16:23         ` Andreas Hartmann
2017-12-06  3:08           ` Jason Wang
2017-12-08  7:21             ` Andreas Hartmann
2017-12-08  8:47               ` Michal Kubecek
2017-12-08 10:31                 ` Andreas Hartmann
2017-12-08 11:40                   ` Michal Kubecek
2017-12-08 12:45                     ` Andreas Hartmann
2017-12-08 12:58                       ` Michal Kubecek
2017-12-08 13:13                         ` Andreas Hartmann
2017-12-08 15:11                           ` Jason Wang
2017-12-08 16:04                     ` Willem de Bruijn
2017-12-08 20:11                       ` Andreas Hartmann
2017-12-08 20:44                         ` Andreas Hartmann
2017-12-11 15:54                           ` Andreas Hartmann
2017-12-14 16:31                             ` Andreas Hartmann
2017-12-14 22:17                             ` Willem de Bruijn
2017-12-14 22:47                               ` Willem de Bruijn
2017-12-15  6:05                               ` Andreas Hartmann
2017-12-17 22:33                                 ` Willem de Bruijn [this message]
2017-12-18 17:11                                   ` Andreas Hartmann
2017-12-20 15:56                                     ` Andreas Hartmann
2017-12-20 22:44                                       ` Willem de Bruijn
2017-12-21 17:05                                         ` Andreas Hartmann
2017-12-21 17:11                                           ` Willem de Bruijn
2017-12-24 16:24                                       ` Andreas Hartmann
2017-12-24 18:54                                         ` Willem de Bruijn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAF=yD-LWyCD4Y0aJ9O0e_CHLR+3JOeKicRRTEVCPxgw4XOcqGQ@mail.gmail.com' \
    --to=willemdebruijn.kernel@gmail.com \
    --cc=andihartmann@01019freenet.de \
    --cc=davem@davemloft.net \
    --cc=jasowang@redhat.com \
    --cc=mkubecek@suse.cz \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.