All of lore.kernel.org
 help / color / mirror / Atom feed
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Jason Wang <jasowang@redhat.com>
Cc: David Miller <davem@davemloft.net>,
	Michal Kubecek <mkubecek@suse.cz>,
	Network Development <netdev@vger.kernel.org>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Vlad Yasevic <vyasevic@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: regression: UFO removal breaks kvm live migration
Date: Fri, 17 Nov 2017 09:31:23 -0500	[thread overview]
Message-ID: <CAF=yD-LuUeDuL9YWPJD9ykOZ0QCjNeznPDr6whqZ9NGMNF12Mw@mail.gmail.com> (raw)
In-Reply-To: <CAF=yD-J3+YJgxEYmaFRgk2QFfYLyu7fKDCm=qkrVo9qBXRHoTA@mail.gmail.com>

On Fri, Nov 10, 2017 at 12:32 AM, Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
> On Wed, Nov 8, 2017 at 9:53 PM, Jason Wang <jasowang@redhat.com> wrote:
>>
>>
>> On 2017年11月08日 20:32, David Miller wrote:
>>>
>>> From: Jason Wang <jasowang@redhat.com>
>>> Date: Wed, 8 Nov 2017 17:25:48 +0900
>>>
>>>> On 2017年11月08日 17:08, Willem de Bruijn wrote:
>>>>>
>>>>> That won't help in the short term. I'm still reading up to see if
>>>>> there are
>>>>> any other options besides reimplement or advertise-but-drop, such as
>>>>> an implicit trigger that would make the guest renegotiate. It's
>>>>> unlikely, but
>>>>> worth a look..
>>>>
>>>> Yes, this looks hard. And even if we can manage to do this, it looks
>>>> an overkill since it will impact all guest after migration.
>>>
>>> Like Willem I would much prefer "advertise-but-drop" if it works.
>>
>>
>> This makes migration work but all guest UFO traffic will stall.
>>
>>>
>>> In the long term feature renegotiation triggers are a must.
>>>
>>> There is no way for us to remove features otherwise.
>>
>>
>> We can remove if we don't break userspace(guest).
>>
>>> In my opinion
>>> this will even make migrations more powerful.
>>
>>
>> But this does not help for guest running old version of kernel which still
>> think UFO work.
>
> Indeed, if we have to support live migration of arbitrary old guests
> without any expectations on hypervisor version either, features can
> simply never be reverted, even if a negotiation interface exists.
>
> At least for upcoming features and devices, guest code should not
> have this expectation, but from the start allow renegation such as
> CTRL_GUEST_OFFLOADS [1] based on a host trigger. But for
> tuntap TUNSETOFFLOAD it seems that ship has sailed.
>
> Okay, I will send a patch to reinstate UFO for this use case (only). There
> is some related work in tap_handle_frame and packet_direct_xmit to
> segment directly in the device. I will be traveling the next few days, so
> it won't be in time for 4.14 (but can go in stable later, of course).

I'm finishing up and running some tests. The majority of the patch is a
straightforward partial revert of the patchset, so while fairly large for a
patch to net (~150 lines, esp. in udp[46]_ufo_fragment), that is all
thoroughly tested code. Notably absent are the protocol layer and
hardware support (NETIF_F_UFO) portions.

The only open issue is whether to rely on existing skb_gso_segment
processing in the transmit path from validate_xmit_skb or to add new
skb_gso_segment calls directly to tun_get_user, tap_get_user and
pf_packet. Tun has to loop around four different ways of injecting
packets into the device. Something like the below snippet.

More conservative is to introduce no completely new code and rely on
validate_xmit_skb, but that means having to protect the entire stack
against skbs with SKB_GSO_UDP, so also bringing back some
checksum and fragment handling snippets in gre_gso_segment,
__skb_udp_tunnel_segment, act_csum and openvswitch.

A third option is to send the conservative approach to net, then
in net-next follow up with a patch to plug the SKB_GSO_UDP
directly in the devices and revert the tunnel/act/openvswitch stanzas
I'm leaning towards that approach.

@@ -1380,7 +1380,7 @@ static ssize_t tun_get_user(struct tun_struct
*tun, struct tun_file *tfile,
                            int noblock, bool more)
 {
        struct tun_pi pi = { 0, cpu_to_be16(ETH_P_IP) };
-       struct sk_buff *skb;
+       struct sk_buff *skb, *segs = NULL;
        size_t total_len = iov_iter_count(from);
        size_t len = total_len, align = tun->align, linear;
        struct virtio_net_hdr gso = { 0 };
@@ -1552,12 +1552,33 @@ static ssize_t tun_get_user(struct tun_struct
*tun, struct tun_file *tfile,
        }

        rxhash = __skb_get_hash_symmetric(skb);
+
+       if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP) {
+               skb_push(skb, ETH_HLEN);
+               segs = __skb_gso_segment(skb, netif_skb_features(skb), false);
+
+               if (IS_ERR(segs)) {
+                       kfree_skb(skb);
+                       return PTR_ERR(segs);
+               }
+
+               if (segs) {
+                       consume_skb(skb);
+                       skb = segs;
+               }
+again:
+               skb_pull(skb, ETH_HLEN);
+               segs = skb->next;
+               skb->next = NULL;
+       }
+
 #ifndef CONFIG_4KSTACKS
-        tun_rx_batched(tun, tfile, skb, more);
+       tun_rx_batched(tun, tfile, skb, more || segs);
 #else
        netif_rx_ni(skb);
 #endif

+       if (segs) {
+               skb = segs;
+               goto again;
+       }

  parent reply	other threads:[~2017-11-17 14:32 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-07  8:02 regression: UFO removal breaks kvm live migration Michal Kubecek
2017-11-08  3:36 ` Willem de Bruijn
2017-11-08  6:26   ` David Miller
2017-11-08  7:49     ` Jason Wang
2017-11-08  8:08       ` Willem de Bruijn
2017-11-08  8:25         ` Jason Wang
2017-11-08 11:32           ` David Miller
2017-11-08 12:53             ` Jason Wang
2017-11-08 12:58               ` David Miller
2017-11-10  5:32               ` Willem de Bruijn
2017-11-10  5:59                 ` David Miller
2017-11-17 14:31                 ` Willem de Bruijn [this message]
2017-11-17 14:48                   ` Willem de Bruijn
2017-11-17 23:00                     ` Willem de Bruijn
2017-11-08 16:01             ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAF=yD-LuUeDuL9YWPJD9ykOZ0QCjNeznPDr6whqZ9NGMNF12Mw@mail.gmail.com' \
    --to=willemdebruijn.kernel@gmail.com \
    --cc=davem@davemloft.net \
    --cc=jasowang@redhat.com \
    --cc=mkubecek@suse.cz \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=vyasevic@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.