All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb()
@ 2016-09-20 14:27 Sowmini Varadhan
  2016-09-20 15:31 ` Tom Herbert
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Sowmini Varadhan @ 2016-09-20 14:27 UTC (permalink / raw)
  To: netdev; +Cc: davem, jbenc, hannes, aduyck, daniel, pabeni, sowmini.varadhan

The vxlan header is at offset (14 + 20 + 8) into the packet,
so the vxh is not aligned in vxlan_build_skb. Use [get/put]_unaligned
functions to modify flags and vni field in the vxh.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
 drivers/net/vxlan.c |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index e7d1668..f903fa4 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1751,15 +1751,17 @@ static int vxlan_build_skb(struct sk_buff *skb, struct dst_entry *dst,
 		goto out_free;
 
 	vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
-	vxh->vx_flags = VXLAN_HF_VNI;
-	vxh->vx_vni = vxlan_vni_field(vni);
+	put_unaligned(VXLAN_HF_VNI, &vxh->vx_flags);
+	put_unaligned(vxlan_vni_field(vni), &vxh->vx_vni);
 
 	if (type & SKB_GSO_TUNNEL_REMCSUM) {
 		unsigned int start;
+		__be32 tmpvni = get_unaligned(&vxh->vx_vni);
 
 		start = skb_checksum_start_offset(skb) - sizeof(struct vxlanhdr);
-		vxh->vx_vni |= vxlan_compute_rco(start, skb->csum_offset);
-		vxh->vx_flags |= VXLAN_HF_RCO;
+		tmpvni |= vxlan_compute_rco(start, skb->csum_offset);
+		put_unaligned(tmpvni, &vxh->vx_vni);
+		put_unaligned(VXLAN_HF_VNI | VXLAN_HF_RCO, &vxh->vx_flags);
 
 		if (!skb_is_gso(skb)) {
 			skb->ip_summed = CHECKSUM_NONE;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb()
  2016-09-20 14:27 [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb() Sowmini Varadhan
@ 2016-09-20 15:31 ` Tom Herbert
  2016-09-20 15:49   ` Sowmini Varadhan
  2016-09-20 16:11 ` Jiri Benc
  2016-09-20 16:45 ` Hannes Frederic Sowa
  2 siblings, 1 reply; 23+ messages in thread
From: Tom Herbert @ 2016-09-20 15:31 UTC (permalink / raw)
  To: Sowmini Varadhan
  Cc: Linux Kernel Network Developers, David S. Miller, Jiri Benc,
	Hannes Frederic Sowa, Alexander Duyck, Daniel Borkmann,
	Paolo Abeni

On Tue, Sep 20, 2016 at 7:27 AM, Sowmini Varadhan
<sowmini.varadhan@oracle.com> wrote:
> The vxlan header is at offset (14 + 20 + 8) into the packet,
> so the vxh is not aligned in vxlan_build_skb. Use [get/put]_unaligned
> functions to modify flags and vni field in the vxh.
>
I'm wondering if VXLAN is just the tip of the iceberg. Wouldn't this
is also be a problem in GRE and Geneve when they encapsulate Ethernet?
What about the outer IP header, wouldn't that potentially have
unaligned accesses also when creating it?

Tom

> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
> ---
>  drivers/net/vxlan.c |   10 ++++++----
>  1 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index e7d1668..f903fa4 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -1751,15 +1751,17 @@ static int vxlan_build_skb(struct sk_buff *skb, struct dst_entry *dst,
>                 goto out_free;
>
>         vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
> -       vxh->vx_flags = VXLAN_HF_VNI;
> -       vxh->vx_vni = vxlan_vni_field(vni);
> +       put_unaligned(VXLAN_HF_VNI, &vxh->vx_flags);
> +       put_unaligned(vxlan_vni_field(vni), &vxh->vx_vni);
>
>         if (type & SKB_GSO_TUNNEL_REMCSUM) {
>                 unsigned int start;
> +               __be32 tmpvni = get_unaligned(&vxh->vx_vni);
>
>                 start = skb_checksum_start_offset(skb) - sizeof(struct vxlanhdr);
> -               vxh->vx_vni |= vxlan_compute_rco(start, skb->csum_offset);
> -               vxh->vx_flags |= VXLAN_HF_RCO;
> +               tmpvni |= vxlan_compute_rco(start, skb->csum_offset);
> +               put_unaligned(tmpvni, &vxh->vx_vni);
> +               put_unaligned(VXLAN_HF_VNI | VXLAN_HF_RCO, &vxh->vx_flags);
>
>                 if (!skb_is_gso(skb)) {
>                         skb->ip_summed = CHECKSUM_NONE;
> --
> 1.7.1
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb()
  2016-09-20 15:31 ` Tom Herbert
@ 2016-09-20 15:49   ` Sowmini Varadhan
  0 siblings, 0 replies; 23+ messages in thread
From: Sowmini Varadhan @ 2016-09-20 15:49 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Linux Kernel Network Developers, David S. Miller, Jiri Benc,
	Hannes Frederic Sowa, Alexander Duyck, Daniel Borkmann,
	Paolo Abeni

On (09/20/16 08:31), Tom Herbert wrote:
> 
> On Tue, Sep 20, 2016 at 7:27 AM, Sowmini Varadhan
> <sowmini.varadhan@oracle.com> wrote:
> > The vxlan header is at offset (14 + 20 + 8) into the packet,
> > so the vxh is not aligned in vxlan_build_skb. Use [get/put]_unaligned
> > functions to modify flags and vni field in the vxh.
> >
> I'm wondering if VXLAN is just the tip of the iceberg. Wouldn't this
> is also be a problem in GRE and Geneve when they encapsulate Ethernet?
> What about the outer IP header, wouldn't that potentially have
> unaligned accesses also when creating it?

Indeed. All of the above are true, and we have to conquer
each unaligned access, one at a time.

There's also the over-arching problem of arbitrary encapsulations
introducing intervening headers that throw off alignment. 
That would be a design discussion to be had with the ietf.

--Sowmini

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb()
  2016-09-20 14:27 [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb() Sowmini Varadhan
  2016-09-20 15:31 ` Tom Herbert
@ 2016-09-20 16:11 ` Jiri Benc
  2016-09-20 16:31   ` Sowmini Varadhan
  2016-09-20 16:45 ` Hannes Frederic Sowa
  2 siblings, 1 reply; 23+ messages in thread
From: Jiri Benc @ 2016-09-20 16:11 UTC (permalink / raw)
  To: Sowmini Varadhan; +Cc: netdev, davem, hannes, aduyck, daniel, pabeni

On Tue, 20 Sep 2016 10:27:00 -0400, Sowmini Varadhan wrote:
> The vxlan header is at offset (14 + 20 + 8) into the packet,
> so the vxh is not aligned in vxlan_build_skb. Use [get/put]_unaligned
> functions to modify flags and vni field in the vxh.

How did you calculate that? IP header should be aligned to 4 bytes, UDP
header is 8 bytes, thus VXLAN header is also aligned to 4 bytes.

> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index e7d1668..f903fa4 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -1751,15 +1751,17 @@ static int vxlan_build_skb(struct sk_buff *skb, struct dst_entry *dst,
>  		goto out_free;
>  
>  	vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
> -	vxh->vx_flags = VXLAN_HF_VNI;
> -	vxh->vx_vni = vxlan_vni_field(vni);
> +	put_unaligned(VXLAN_HF_VNI, &vxh->vx_flags);
> +	put_unaligned(vxlan_vni_field(vni), &vxh->vx_vni);
>  
>  	if (type & SKB_GSO_TUNNEL_REMCSUM) {
>  		unsigned int start;
> +		__be32 tmpvni = get_unaligned(&vxh->vx_vni);
>  
>  		start = skb_checksum_start_offset(skb) - sizeof(struct vxlanhdr);
> -		vxh->vx_vni |= vxlan_compute_rco(start, skb->csum_offset);
> -		vxh->vx_flags |= VXLAN_HF_RCO;
> +		tmpvni |= vxlan_compute_rco(start, skb->csum_offset);
> +		put_unaligned(tmpvni, &vxh->vx_vni);
> +		put_unaligned(VXLAN_HF_VNI | VXLAN_HF_RCO, &vxh->vx_flags);

If you went this way, it would be better to make two local variables
for vx_flags and vx_vni, store to them and do a single put_unaligned
after the condition. That way, you would have two less put_unaligned and
no get_unaligned in the remote csum case. And the code would be
cleaner. And you're missing vx_flags being accessed in
vxlan_build_gbp_hdr.

But I think this is not needed at all, see above.

 Jiri

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb()
  2016-09-20 16:11 ` Jiri Benc
@ 2016-09-20 16:31   ` Sowmini Varadhan
  2016-09-20 16:43     ` Jiri Benc
  0 siblings, 1 reply; 23+ messages in thread
From: Sowmini Varadhan @ 2016-09-20 16:31 UTC (permalink / raw)
  To: Jiri Benc; +Cc: netdev, davem, hannes, aduyck, daniel, pabeni

On (09/20/16 18:11), Jiri Benc wrote:
> > The vxlan header is at offset (14 + 20 + 8) into the packet,
> > so the vxh is not aligned in vxlan_build_skb. Use [get/put]_unaligned
> > functions to modify flags and vni field in the vxh.
> 
> How did you calculate that? IP header should be aligned to 4 bytes, UDP
> header is 8 bytes, thus VXLAN header is also aligned to 4 bytes.

The vxlan header is after the ethernet header (14 bytes),
IP header (20 bytes, assuming no options) and udp header (8 bytes).
Post the skb_reserve adjustments (see computations in in mld_newpack(),
for example), this triggers an unaligned access on sparc.

> If you went this way, it would be better to make two local variables
> for vx_flags and vx_vni, store to them and do a single put_unaligned
> after the condition. That way, you would have two less put_unaligned and
> no get_unaligned in the remote csum case.

Ok, that is certainly possible.

> And the code would be
> cleaner. And you're missing vx_flags being accessed in
> vxlan_build_gbp_hdr.

Sure, and there's also potential unaligned access in vxlan_build_gpe_hdr.

But as I was telling Tom, this problem is much deeper than this fix, and 
I dont have the facility, at the moment, to test out every one of these
code paths. We would have to fix these, one at a time, in subsequent
patches. This one just fixes the top-level basic code paths.

> But I think this is not needed at all, see above.
> 
>  Jiri

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb()
  2016-09-20 16:31   ` Sowmini Varadhan
@ 2016-09-20 16:43     ` Jiri Benc
  2016-09-20 17:07       ` Sowmini Varadhan
  2016-09-20 17:09       ` Jiri Benc
  0 siblings, 2 replies; 23+ messages in thread
From: Jiri Benc @ 2016-09-20 16:43 UTC (permalink / raw)
  To: Sowmini Varadhan; +Cc: netdev, davem, hannes, aduyck, daniel, pabeni

On Tue, 20 Sep 2016 12:31:08 -0400, Sowmini Varadhan wrote:
> The vxlan header is after the ethernet header (14 bytes),
> IP header (20 bytes, assuming no options) and udp header (8 bytes).

If the VXLAN header is not aligned to 4 bytes, then IP header is not
aligned to 4 bytes, too, and you have greater problems than just VXLAN.

> Post the skb_reserve adjustments (see computations in in mld_newpack(),
> for example), this triggers an unaligned access on sparc.

IPv6 header is certainly not 20 bytes.

If you see unaligned access, something is wrong. But I very much doubt
that the problem is at the place you're trying to fix. Could you share
the traces with us?

 Jiri

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb()
  2016-09-20 14:27 [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb() Sowmini Varadhan
  2016-09-20 15:31 ` Tom Herbert
  2016-09-20 16:11 ` Jiri Benc
@ 2016-09-20 16:45 ` Hannes Frederic Sowa
  2 siblings, 0 replies; 23+ messages in thread
From: Hannes Frederic Sowa @ 2016-09-20 16:45 UTC (permalink / raw)
  To: Sowmini Varadhan, netdev; +Cc: davem, jbenc, aduyck, daniel, pabeni

On 20.09.2016 16:27, Sowmini Varadhan wrote:
> The vxlan header is at offset (14 + 20 + 8) into the packet,
> so the vxh is not aligned in vxlan_build_skb. Use [get/put]_unaligned
> functions to modify flags and vni field in the vxh.
> 
> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
> ---
>  drivers/net/vxlan.c |   10 ++++++----
>  1 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index e7d1668..f903fa4 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -1751,15 +1751,17 @@ static int vxlan_build_skb(struct sk_buff *skb, struct dst_entry *dst,
>  		goto out_free;
>  
>  	vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
> -	vxh->vx_flags = VXLAN_HF_VNI;
> -	vxh->vx_vni = vxlan_vni_field(vni);
> +	put_unaligned(VXLAN_HF_VNI, &vxh->vx_flags);
> +	put_unaligned(vxlan_vni_field(vni), &vxh->vx_vni);
>  
>  	if (type & SKB_GSO_TUNNEL_REMCSUM) {
>  		unsigned int start;
> +		__be32 tmpvni = get_unaligned(&vxh->vx_vni);
>  
>  		start = skb_checksum_start_offset(skb) - sizeof(struct vxlanhdr);
> -		vxh->vx_vni |= vxlan_compute_rco(start, skb->csum_offset);
> -		vxh->vx_flags |= VXLAN_HF_RCO;
> +		tmpvni |= vxlan_compute_rco(start, skb->csum_offset);
> +		put_unaligned(tmpvni, &vxh->vx_vni);
> +		put_unaligned(VXLAN_HF_VNI | VXLAN_HF_RCO, &vxh->vx_flags);
>  
>  		if (!skb_is_gso(skb)) {
>  			skb->ip_summed = CHECKSUM_NONE;
> 

The easiest way would be to make struct vxlanhdr __packed. Probably the
best way out of this instead of randomly adding put/get unaligned.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb()
  2016-09-20 16:43     ` Jiri Benc
@ 2016-09-20 17:07       ` Sowmini Varadhan
  2016-09-20 17:15         ` Jiri Benc
  2016-09-20 17:09       ` Jiri Benc
  1 sibling, 1 reply; 23+ messages in thread
From: Sowmini Varadhan @ 2016-09-20 17:07 UTC (permalink / raw)
  To: Jiri Benc; +Cc: netdev, davem, hannes, aduyck, daniel, pabeni

On (09/20/16 18:43), Jiri Benc wrote:
> 
> IPv6 header is certainly not 20 bytes.

vxlan encapsulates an IPv6/ethernet frame in a UDP/IPv4/ethernet packet.
the ipv4 header is 20 bytes (without options).
mld_newpack is dealing with the vxlan net_device in this case,
which has ->hard_header_len of 14, and ->needed_headroom of 64,
so hlen (used in mld_newpack) comes out to be 80.

> If you see unaligned access, something is wrong. But I very much doubt
> that the problem is at the place you're trying to fix. Could you share
> the traces with us?

Exactly what traces do you want? I think you can check this easily
by using printk's predicated on IS_ALIGNED checks in vxlan_build_skb().
IPv6 MLD triggers this quite easily.

I will try out Hannes' solution (which makes sense) in a moment, 
and report back.

--Sowmini

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb()
  2016-09-20 16:43     ` Jiri Benc
  2016-09-20 17:07       ` Sowmini Varadhan
@ 2016-09-20 17:09       ` Jiri Benc
  2016-09-20 17:19         ` Tom Herbert
                           ` (2 more replies)
  1 sibling, 3 replies; 23+ messages in thread
From: Jiri Benc @ 2016-09-20 17:09 UTC (permalink / raw)
  To: Sowmini Varadhan; +Cc: netdev, davem, hannes, aduyck, daniel, pabeni

On Tue, 20 Sep 2016 18:43:33 +0200, Jiri Benc wrote:
> On Tue, 20 Sep 2016 12:31:08 -0400, Sowmini Varadhan wrote:
> > The vxlan header is after the ethernet header (14 bytes),
> > IP header (20 bytes, assuming no options) and udp header (8 bytes).
> 
> If the VXLAN header is not aligned to 4 bytes, then IP header is not
> aligned to 4 bytes, too, and you have greater problems than just VXLAN.

...which is indeed the case. Sorry, I guess I'm too much focused on
VXLAN-GPE nowadays and I missed that we're building this before the
inner Ethernet header.

But the point stands, we have much greater problems here than VXLAN.
And I don't think that wrapping all IP address accesses into
get/put_unaligned all around the stack is the solution.

 Jiri

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb()
  2016-09-20 17:07       ` Sowmini Varadhan
@ 2016-09-20 17:15         ` Jiri Benc
  0 siblings, 0 replies; 23+ messages in thread
From: Jiri Benc @ 2016-09-20 17:15 UTC (permalink / raw)
  To: Sowmini Varadhan; +Cc: netdev, davem, hannes, aduyck, daniel, pabeni

On Tue, 20 Sep 2016 13:07:42 -0400, Sowmini Varadhan wrote:
> I will try out Hannes' solution (which makes sense) in a moment, 
> and report back.

No objections to marking the struct as packed. Will only push the
problem to a different place, though.

 Jiri

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb()
  2016-09-20 17:09       ` Jiri Benc
@ 2016-09-20 17:19         ` Tom Herbert
  2016-09-20 17:24         ` Sowmini Varadhan
  2016-09-22  5:52         ` David Miller
  2 siblings, 0 replies; 23+ messages in thread
From: Tom Herbert @ 2016-09-20 17:19 UTC (permalink / raw)
  To: Jiri Benc
  Cc: Sowmini Varadhan, Linux Kernel Network Developers,
	David S. Miller, Hannes Frederic Sowa, Alexander Duyck,
	Daniel Borkmann, Paolo Abeni

On Tue, Sep 20, 2016 at 10:09 AM, Jiri Benc <jbenc@redhat.com> wrote:
> On Tue, 20 Sep 2016 18:43:33 +0200, Jiri Benc wrote:
>> On Tue, 20 Sep 2016 12:31:08 -0400, Sowmini Varadhan wrote:
>> > The vxlan header is after the ethernet header (14 bytes),
>> > IP header (20 bytes, assuming no options) and udp header (8 bytes).
>>
>> If the VXLAN header is not aligned to 4 bytes, then IP header is not
>> aligned to 4 bytes, too, and you have greater problems than just VXLAN.
>
> ...which is indeed the case. Sorry, I guess I'm too much focused on
> VXLAN-GPE nowadays and I missed that we're building this before the
> inner Ethernet header.
>
> But the point stands, we have much greater problems here than VXLAN.
> And I don't think that wrapping all IP address accesses into
> get/put_unaligned all around the stack is the solution.
>
RIght, this is problem with the encapsulations themselves. IP
protocols are supposed to be four-byte aligned and Ethernet
encapsulation is obviously breaking that. We can't address this in
existing standards like GRE-TEB, but for new encapsulations we should
make sure this avoided. I've posted on nvo3 list about this issue.

Tom

>  Jiri

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb()
  2016-09-20 17:09       ` Jiri Benc
  2016-09-20 17:19         ` Tom Herbert
@ 2016-09-20 17:24         ` Sowmini Varadhan
  2016-09-22  5:52         ` David Miller
  2 siblings, 0 replies; 23+ messages in thread
From: Sowmini Varadhan @ 2016-09-20 17:24 UTC (permalink / raw)
  To: Jiri Benc; +Cc: netdev, davem, hannes, aduyck, daniel, pabeni

On (09/20/16 19:09), Jiri Benc wrote:
> 
> But the point stands, we have much greater problems here than VXLAN.
> And I don't think that wrapping all IP address accesses into
> get/put_unaligned all around the stack is the solution.
> 

Agreed, and I think Tom made that point too. 
For the immediate pain, Hannes' suggestion of making vxlanhdr
__packed takes care of the noise from vxlan_build_skb, so I will
spin out a v2 that takes care of this one, at least.

--Sowmini

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb()
  2016-09-20 17:09       ` Jiri Benc
  2016-09-20 17:19         ` Tom Herbert
  2016-09-20 17:24         ` Sowmini Varadhan
@ 2016-09-22  5:52         ` David Miller
  2016-09-22 21:30           ` Sowmini Varadhan
  2 siblings, 1 reply; 23+ messages in thread
From: David Miller @ 2016-09-22  5:52 UTC (permalink / raw)
  To: jbenc; +Cc: sowmini.varadhan, netdev, hannes, aduyck, daniel, pabeni

From: Jiri Benc <jbenc@redhat.com>
Date: Tue, 20 Sep 2016 19:09:29 +0200

> But the point stands, we have much greater problems here than VXLAN.
> And I don't think that wrapping all IP address accesses into
> get/put_unaligned all around the stack is the solution.

Right, and I don't like marking things as packed either.

We need something that really solves the problem.  We can't
change the existing protocols, but we can perhaps change the
geometry of the SKB when we deal with such protocols.

For example, we can memmove() to align the headers at skb->data and
then for the skb->data portion past the headers we insert a frag
pointing to it at the front of the frag list.

So we "memmove" down, creating a gap, and then past the gap
is the post-header area which gets inserted into the head of
the SKB's fraglist.

That will align all of the subsequent headers and avoid the
unaligned accesses after the vxlan header.

Alternatively we can do Alexander Duyck's trick, by pushing
the headers into the frag list, forcing a pull and realignment
by the next protocol layer.

This is so much better than the little hacks sprinkled all
over the problem and tackles the fundamental issue.

Thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb()
  2016-09-22  5:52         ` David Miller
@ 2016-09-22 21:30           ` Sowmini Varadhan
  2016-09-23 12:06             ` David Miller
  0 siblings, 1 reply; 23+ messages in thread
From: Sowmini Varadhan @ 2016-09-22 21:30 UTC (permalink / raw)
  To: David Miller; +Cc: jbenc, netdev, hannes, aduyck, daniel, pabeni

On (09/22/16 01:52), David Miller wrote:
> Alternatively we can do Alexander Duyck's trick, by pushing
> the headers into the frag list, forcing a pull and realignment
> by the next protocol layer.

What is the "Alexander Duyck trick" (hints about module or commit id,
where this can be found, please)?

Is this basically about, e.g., putting the vxlanhdr in its own
skb_frag_t, or something else?

--Sowmini

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb()
  2016-09-22 21:30           ` Sowmini Varadhan
@ 2016-09-23 12:06             ` David Miller
  2016-09-23 14:17               ` Alexander Duyck
  0 siblings, 1 reply; 23+ messages in thread
From: David Miller @ 2016-09-23 12:06 UTC (permalink / raw)
  To: sowmini.varadhan; +Cc: jbenc, netdev, hannes, aduyck, daniel, pabeni

From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Date: Thu, 22 Sep 2016 17:30:10 -0400

> On (09/22/16 01:52), David Miller wrote:
>> Alternatively we can do Alexander Duyck's trick, by pushing
>> the headers into the frag list, forcing a pull and realignment
>> by the next protocol layer.
> 
> What is the "Alexander Duyck trick" (hints about module or commit id,
> where this can be found, please)?
> 
> Is this basically about, e.g., putting the vxlanhdr in its own
> skb_frag_t, or something else?

Yes, and this way skb_header_pointer() is forced to do a memcpy.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb()
  2016-09-23 12:06             ` David Miller
@ 2016-09-23 14:17               ` Alexander Duyck
  2016-09-23 17:20                 ` Sowmini Varadhan
  0 siblings, 1 reply; 23+ messages in thread
From: Alexander Duyck @ 2016-09-23 14:17 UTC (permalink / raw)
  To: David Miller
  Cc: Sowmini Varadhan, Jiri Benc, Netdev, Hannes Frederic Sowa,
	Alexander Duyck, Daniel Borkmann, Paolo Abeni

On Fri, Sep 23, 2016 at 5:06 AM, David Miller <davem@davemloft.net> wrote:
> From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
> Date: Thu, 22 Sep 2016 17:30:10 -0400
>
>> On (09/22/16 01:52), David Miller wrote:
>>> Alternatively we can do Alexander Duyck's trick, by pushing
>>> the headers into the frag list, forcing a pull and realignment
>>> by the next protocol layer.
>>
>> What is the "Alexander Duyck trick" (hints about module or commit id,
>> where this can be found, please)?
>>
>> Is this basically about, e.g., putting the vxlanhdr in its own
>> skb_frag_t, or something else?
>
> Yes, and this way skb_header_pointer() is forced to do a memcpy.

It is something I had proposed on the Rx side of things back in
February when the issue came up in relation to skb parsing.  On
architectures that require NET_IP_ALIGN to be set I had proposed that
we should only pull the outer headers from the page frag, and then
when the time is right we drop the outer headers and pull the inner
headers from the page frag.  That way we can keep all the headers
aligned.

For Tx it all becomes a bit trickier since it would likely require us
to shift the frags up by 1 when we go from outer headers to inner
headers.  One thought I had on that is that we could possibly avoid
having to do any allocation and could just take a reference on the
head_frag if that is what we are using.  Then we just add a 2 byte pad
and resume writing headers in place and the pointer offsets for the
inner headers would remain valid, though they would be past the point
of skb->tail.

- Alex

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb()
  2016-09-23 14:17               ` Alexander Duyck
@ 2016-09-23 17:20                 ` Sowmini Varadhan
  2016-09-23 17:38                   ` Alexander Duyck
  0 siblings, 1 reply; 23+ messages in thread
From: Sowmini Varadhan @ 2016-09-23 17:20 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: David Miller, Jiri Benc, Netdev, Hannes Frederic Sowa,
	Alexander Duyck, Daniel Borkmann, Paolo Abeni

On (09/23/16 07:17), Alexander Duyck wrote:
> >> Is this basically about, e.g., putting the vxlanhdr in its own
> >> skb_frag_t, or something else?
> >
> > Yes, and this way skb_header_pointer() is forced to do a memcpy.
  :
> For Tx it all becomes a bit trickier since it would likely require us
> to shift the frags up by 1 when we go from outer headers to inner
> headers.  

here's how I thought through this so far, based on what I'm seeing for 
mld_newpack/vxlan (not sure if this can be extended for all the 
other tunnelling cases as well)..

today the skb is set up so that we reserve LL_RESERVED_SPACE
in the headroom, and vxlan sets up needed headroom for 
(outer_ether + ip + udp + vxlan + inner_ether). Insterad, if it
set up the needed_headroom for just (outer_ether, ip, udp) and
we had something like a "needed_fragroom" in the net_device, 
maybe we could set up the skb so that we dont have to shift the frags
by 1. 

Drawbacks: this ends up with every skb going through vxlan etc being
non-linear, so it is a lot of churn for several functions (e.g.,
even mld_newpack() cannot just skb_put() things around). Also
this probably gets quickly messy if we are dealing with multiple 
encaapsulations (even in the simple vxlan case we have 
vxlan + inner mac/ip/etc)

BTW, I wonder if there is a small vxlan bug here- are we
accounting for the outer_ether twice in LL_RESERVED_SPACE: once in 
->hard_header_len, and once in ->needed_headroom?

> One thought I had on that is that we could possibly avoid
> having to do any allocation and could just take a reference on the
> head_frag if that is what we are using.  Then we just add a 2 byte pad
> and resume writing headers in place and the pointer offsets for the
> inner headers would remain valid, though they would be past the point
> of skb->tail.

I am not sure I follow, can you elaborate? Doesnt this also assume
that every skb is necessarily non-linear?

--Sowmini

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb()
  2016-09-23 17:20                 ` Sowmini Varadhan
@ 2016-09-23 17:38                   ` Alexander Duyck
  2016-09-23 23:41                     ` Sowmini Varadhan
  0 siblings, 1 reply; 23+ messages in thread
From: Alexander Duyck @ 2016-09-23 17:38 UTC (permalink / raw)
  To: Sowmini Varadhan
  Cc: David Miller, Jiri Benc, Netdev, Hannes Frederic Sowa,
	Alexander Duyck, Daniel Borkmann, Paolo Abeni

On Fri, Sep 23, 2016 at 10:20 AM, Sowmini Varadhan
<sowmini.varadhan@oracle.com> wrote:
> On (09/23/16 07:17), Alexander Duyck wrote:
>> >> Is this basically about, e.g., putting the vxlanhdr in its own
>> >> skb_frag_t, or something else?
>> >
>> > Yes, and this way skb_header_pointer() is forced to do a memcpy.
>   :
>> For Tx it all becomes a bit trickier since it would likely require us
>> to shift the frags up by 1 when we go from outer headers to inner
>> headers.
>
> here's how I thought through this so far, based on what I'm seeing for
> mld_newpack/vxlan (not sure if this can be extended for all the
> other tunnelling cases as well)..
>
> today the skb is set up so that we reserve LL_RESERVED_SPACE
> in the headroom, and vxlan sets up needed headroom for
> (outer_ether + ip + udp + vxlan + inner_ether). Insterad, if it
> set up the needed_headroom for just (outer_ether, ip, udp) and
> we had something like a "needed_fragroom" in the net_device,
> maybe we could set up the skb so that we dont have to shift the frags
> by 1.
>
> Drawbacks: this ends up with every skb going through vxlan etc being
> non-linear, so it is a lot of churn for several functions (e.g.,
> even mld_newpack() cannot just skb_put() things around). Also
> this probably gets quickly messy if we are dealing with multiple
> encaapsulations (even in the simple vxlan case we have
> vxlan + inner mac/ip/etc)
>
> BTW, I wonder if there is a small vxlan bug here- are we
> accounting for the outer_ether twice in LL_RESERVED_SPACE: once in
> ->hard_header_len, and once in ->needed_headroom?
>
>> One thought I had on that is that we could possibly avoid
>> having to do any allocation and could just take a reference on the
>> head_frag if that is what we are using.  Then we just add a 2 byte pad
>> and resume writing headers in place and the pointer offsets for the
>> inner headers would remain valid, though they would be past the point
>> of skb->tail.
>
> I am not sure I follow, can you elaborate? Doesnt this also assume
> that every skb is necessarily non-linear?

So basically what I was thinking is we do something like reserving
NET_IP_ALIGN and continue writing headers to skb->data, but we force
the tracking for the inner headers into frag[0] so that we can keep
the inner headers aligned without messing up the alignment for outer
headers.  In theory the inner offset and all that would still be
functional but might need a few tweaks.  You could probably even use
the skb->encapsulation bit to indicate you are doing this.  You could
almost think of it as us doing something like the inverse of
pskb_pull_tail.  The general idea here is we want to actually leave
the data in skb->data, but just reference it from frag[0] so that we
don't accidentally pull in the 2 byte padding for alignment when
transmitting the frame.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb()
  2016-09-23 17:38                   ` Alexander Duyck
@ 2016-09-23 23:41                     ` Sowmini Varadhan
  2016-09-24  0:43                       ` Alexander Duyck
  0 siblings, 1 reply; 23+ messages in thread
From: Sowmini Varadhan @ 2016-09-23 23:41 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: David Miller, Jiri Benc, Netdev, Hannes Frederic Sowa,
	Alexander Duyck, Daniel Borkmann, Paolo Abeni

On (09/23/16 10:38), Alexander Duyck wrote:
> 
> So basically what I was thinking is we do something like reserving
> NET_IP_ALIGN and continue writing headers to skb->data, but we force
> the tracking for the inner headers into frag[0] so that we can keep
> the inner headers aligned without messing up the alignment for outer
> headers.  In theory the inner offset and all that would still be
> functional but might need a few tweaks.  You could probably even use
> the skb->encapsulation bit to indicate you are doing this.  You could
> almost think of it as us doing something like the inverse of
> pskb_pull_tail.  The general idea here is we want to actually leave
> the data in skb->data, but just reference it from frag[0] so that we
> don't accidentally pull in the 2 byte padding for alignment when
> transmitting the frame.

yes, I think something along this line could do the trick.. I tried
hacking it a bit today for vxlan, and it could be extended for all
these encaps protocols. Let me fix/test this more next week, maybe
we can discuss in Tokyo.

--Sowmini

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb()
  2016-09-23 23:41                     ` Sowmini Varadhan
@ 2016-09-24  0:43                       ` Alexander Duyck
  2016-09-28 17:03                         ` Sowmini Varadhan
  0 siblings, 1 reply; 23+ messages in thread
From: Alexander Duyck @ 2016-09-24  0:43 UTC (permalink / raw)
  To: Sowmini Varadhan
  Cc: David Miller, Jiri Benc, Netdev, Hannes Frederic Sowa,
	Alexander Duyck, Daniel Borkmann, Paolo Abeni

On Fri, Sep 23, 2016 at 4:41 PM, Sowmini Varadhan
<sowmini.varadhan@oracle.com> wrote:
> On (09/23/16 10:38), Alexander Duyck wrote:
>>
>> So basically what I was thinking is we do something like reserving
>> NET_IP_ALIGN and continue writing headers to skb->data, but we force
>> the tracking for the inner headers into frag[0] so that we can keep
>> the inner headers aligned without messing up the alignment for outer
>> headers.  In theory the inner offset and all that would still be
>> functional but might need a few tweaks.  You could probably even use
>> the skb->encapsulation bit to indicate you are doing this.  You could
>> almost think of it as us doing something like the inverse of
>> pskb_pull_tail.  The general idea here is we want to actually leave
>> the data in skb->data, but just reference it from frag[0] so that we
>> don't accidentally pull in the 2 byte padding for alignment when
>> transmitting the frame.
>
> yes, I think something along this line could do the trick.. I tried
> hacking it a bit today for vxlan, and it could be extended for all
> these encaps protocols. Let me fix/test this more next week, maybe
> we can discuss in Tokyo.

Agreed.  Keep in mind we only really need it for the architectures
that need to set NET_IP_ALIGN so we may want to end up wrapping the
code in ifndef checks for HAVE_EFFICIENT_UNALIGNED_ACCESS.

- Alex

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb()
  2016-09-24  0:43                       ` Alexander Duyck
@ 2016-09-28 17:03                         ` Sowmini Varadhan
  2016-09-28 18:08                           ` Alexander Duyck
  0 siblings, 1 reply; 23+ messages in thread
From: Sowmini Varadhan @ 2016-09-28 17:03 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Netdev, sowmini.varadhan

On (09/23/16 17:43), Alexander Duyck wrote:
> > On (09/23/16 10:38), Alexander Duyck wrote:
          ;
> >> almost think of it as us doing something like the inverse of
> >> pskb_pull_tail.  The general idea here is we want to actually leave
> >> the data in skb->data, but just reference it from frag[0] so that we
> >> don't accidentally pull in the 2 byte padding for alignment when
> >> transmitting the frame.

Some additional findings..

Just to recap how we got here: for the Rx path, the inner packet has
been set up as an ethernet frame with the IP header at an aligned address
when it hits vxlan_build_skb.  But that means the (inner) mac address
was offset by NET_IP_ALIGN so vxlan_build_skb needs to pad the data
by NET_IP_ALIGN to make the vxh outer ip header align. 

Then we'd need to do something like the suggestion above (keep some
pointers in frag[0]?  do the reverse of a pskb_expand_head to push out
the inner ip header to the skb_frag_t?), to have the driver skip over the
pad.. 

I tried the following for a hack, and it takes care of the tx side
unaligned access, though, clearly, the memmove needs to be avoided

@@ -1750,10 +1825,38 @@ static int vxlan_build_skb(struct sk_buff *skb, struct d
        if (err)
                goto out_free;
 
+
+#if (NET_IP_ALIGN != 0) 
+       {
+               unsigned char *data;
+
+               /* inner packet is an ethernet frame that was set up
+                * so that the IP header is aligned. But that means the
+                * mac address was offset by NET_IP_ALIGN, so we need
+                * to move things up so that the vxh and outer ip header
+                * are now aligned
+                * XXX The Alexander Duyck idea was to only do the
+                * extra __skb_push() for NET_IP_ALIGN, and avoid the
+                * extram memmove and ->inner* adjustments. Plus keep
+                * additional pointers in frag[0] and have the driver pick
+                * up pointers from frag[0] .. need to investigate
+                * that suggestion further.
+                */
+               data = skb->data;
+               skb->data -= NET_IP_ALIGN;
+               memmove(skb->data, data, skb_headlen(skb));
+               skb->inner_network_header -= NET_IP_ALIGN;
+               skb->inner_mac_header -= NET_IP_ALIGN;
+               skb->inner_transport_header -= NET_IP_ALIGN;
+       }
+#endif
        vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
        vxh->vx_flags = VXLAN_HF_VNI;
        vxh->vx_vni = vxlan_vni_field(vni);

In the general case (when the skb passed to vxlan_build_skb is already
non-linear), wouldn't we end up having to shift all the frags by 1 and/or
do some type of memory copy of the inner packet? However, I think
there are some clever things we can do in general, to avoid the memmove..

I also looked at the Rx path. Here the suggestion was:
"we should only pull the outer headers from the page frag, and then
 when the time is right we drop the outer headers and pull the inner
 headers from the page frag.  That way we can keep all the headers
 aligned."
I hacked up ixgbe_add_rx_frag to always only create nonlinear skb's, 
i.e., always avoid the
   memcpy(__skb_put(skb, size), va, ALIGN(size, sizeof(long)));
and then I end up with 
- ixgbe_clean_rx_irq copies outer header ether/ip/udp headers
  into linear part as needed, 
- then udp_gro_receive -> vxlan_gro_receive pulls up vxlan header
  into linear part, and then..
- eth_gro_receive pulls up another 34 bytes for the eth + ip header.
  This last pull ends up being unaligned. 
I dont know if we can safely drop the outer ip headers at this point
(have not tried this yet, and I'm not sure we can do this in all udp
encaps cases..)

one other possibility is to set up the inner frame as part of the 
->frag_list (note, this is *not* skb_frag_t). I suspect that is going
to cause other inefficiencies.

but, as tom has been saying all along, a big part of this problem is that
we are tripping up on the ethernet header in the middle of an
IP packet. Unfortunately I dont think the ietf is going to agree
to never ever do that, so I'm not sure we can win that architectural battle.
 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb()
  2016-09-28 17:03                         ` Sowmini Varadhan
@ 2016-09-28 18:08                           ` Alexander Duyck
  2016-09-28 19:56                             ` Sowmini Varadhan
  0 siblings, 1 reply; 23+ messages in thread
From: Alexander Duyck @ 2016-09-28 18:08 UTC (permalink / raw)
  To: Sowmini Varadhan; +Cc: Netdev

On Wed, Sep 28, 2016 at 10:03 AM, Sowmini Varadhan
<sowmini.varadhan@oracle.com> wrote:
> On (09/23/16 17:43), Alexander Duyck wrote:
>> > On (09/23/16 10:38), Alexander Duyck wrote:
>           ;
>> >> almost think of it as us doing something like the inverse of
>> >> pskb_pull_tail.  The general idea here is we want to actually leave
>> >> the data in skb->data, but just reference it from frag[0] so that we
>> >> don't accidentally pull in the 2 byte padding for alignment when
>> >> transmitting the frame.
>
> Some additional findings..
>
> Just to recap how we got here: for the Rx path, the inner packet has
> been set up as an ethernet frame with the IP header at an aligned address
> when it hits vxlan_build_skb.  But that means the (inner) mac address
> was offset by NET_IP_ALIGN so vxlan_build_skb needs to pad the data
> by NET_IP_ALIGN to make the vxh outer ip header align.
>
> Then we'd need to do something like the suggestion above (keep some
> pointers in frag[0]?  do the reverse of a pskb_expand_head to push out
> the inner ip header to the skb_frag_t?), to have the driver skip over the
> pad..
>
> I tried the following for a hack, and it takes care of the tx side
> unaligned access, though, clearly, the memmove needs to be avoided
>
> @@ -1750,10 +1825,38 @@ static int vxlan_build_skb(struct sk_buff *skb, struct d
>         if (err)
>                 goto out_free;
>
> +
> +#if (NET_IP_ALIGN != 0)
> +       {
> +               unsigned char *data;
> +
> +               /* inner packet is an ethernet frame that was set up
> +                * so that the IP header is aligned. But that means the
> +                * mac address was offset by NET_IP_ALIGN, so we need
> +                * to move things up so that the vxh and outer ip header
> +                * are now aligned
> +                * XXX The Alexander Duyck idea was to only do the
> +                * extra __skb_push() for NET_IP_ALIGN, and avoid the
> +                * extram memmove and ->inner* adjustments. Plus keep
> +                * additional pointers in frag[0] and have the driver pick
> +                * up pointers from frag[0] .. need to investigate
> +                * that suggestion further.
> +                */
> +               data = skb->data;
> +               skb->data -= NET_IP_ALIGN;
> +               memmove(skb->data, data, skb_headlen(skb));
> +               skb->inner_network_header -= NET_IP_ALIGN;
> +               skb->inner_mac_header -= NET_IP_ALIGN;
> +               skb->inner_transport_header -= NET_IP_ALIGN;
> +       }
> +#endif
>         vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
>         vxh->vx_flags = VXLAN_HF_VNI;
>         vxh->vx_vni = vxlan_vni_field(vni);
>
> In the general case (when the skb passed to vxlan_build_skb is already
> non-linear), wouldn't we end up having to shift all the frags by 1 and/or
> do some type of memory copy of the inner packet? However, I think
> there are some clever things we can do in general, to avoid the memmove..

Right, basically my idea was to just skip the memmove, pull the data
out and add pointers to this spot in the fraglist.  Doing that you
should be pulling tail back so it is equal to data.  Then you just do
an skb_reserve(skb, -NET_IP_ALIGN) and you should be all set to start
adding outer headers.  The problem is you end up having to disable any
offsets such as GSO or checksum offload since you can't really use the
inner header offsets anymore.

> I also looked at the Rx path. Here the suggestion was:
> "we should only pull the outer headers from the page frag, and then
>  when the time is right we drop the outer headers and pull the inner
>  headers from the page frag.  That way we can keep all the headers
>  aligned."
> I hacked up ixgbe_add_rx_frag to always only create nonlinear skb's,
> i.e., always avoid the
>    memcpy(__skb_put(skb, size), va, ALIGN(size, sizeof(long)));
> and then I end up with
> - ixgbe_clean_rx_irq copies outer header ether/ip/udp headers
>   into linear part as needed,
> - then udp_gro_receive -> vxlan_gro_receive pulls up vxlan header
>   into linear part, and then..

This is the point where we need to stop, drop the existing headers,
call skb_reserve(NET_IP_ALIGN), and then pick back up where we left
off.  We just have to make sure the skbuff isn't shared.

> - eth_gro_receive pulls up another 34 bytes for the eth + ip header.
>   This last pull ends up being unaligned.
> I dont know if we can safely drop the outer ip headers at this point
> (have not tried this yet, and I'm not sure we can do this in all udp
> encaps cases..)

In general the interface behind the tunnel shouldn't need to know
about any of the data in front of the tunnel, so you should be able to
drop the original header offsets and reset things if you want so you
could overwrite the old headers.

> one other possibility is to set up the inner frame as part of the
> ->frag_list (note, this is *not* skb_frag_t). I suspect that is going
> to cause other inefficiencies.

Actually I'm kind of wondering if this might not be the way to go
myself.  The overhead for making this kind of transition is sure to be
ugly though as it would require us to update a number of drivers to
support transmitting a fraglist, and we would have to update all the
header manipulation code so that we could realize that inner and outer
header existed in two separate buffers.  One advantage though would be
that we could get rid of all the "inner_" header bits from the sk_buff
since we could just use the header offsets stored in the frame hanging
off of the frag_list.

> but, as tom has been saying all along, a big part of this problem is that
> we are tripping up on the ethernet header in the middle of an
> IP packet. Unfortunately I dont think the ietf is going to agree
> to never ever do that, so I'm not sure we can win that architectural battle.

If I am not mistaken I think the multi-buffer approach is the approach
taken by other OSes, although for us it is more difficult since we
have the scatter-gather list that is frags, and then the chained
buffer list which is frag_list.  The other gotcha is determining how
many hardware vendors can support having the headers split over 2 DMA
requests.  I know in the case of i40e we would have to update the
driver so that the workaround to avoid exceeding 8 descriptors would
have to factor in the inner headers being split off.

I'm sure we are all going to be talking about this in great detail
next week at netdev/netconf.. :-)

- Alex

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb()
  2016-09-28 18:08                           ` Alexander Duyck
@ 2016-09-28 19:56                             ` Sowmini Varadhan
  0 siblings, 0 replies; 23+ messages in thread
From: Sowmini Varadhan @ 2016-09-28 19:56 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Netdev

On (09/28/16 11:08), Alexander Duyck wrote:
> > - then udp_gro_receive -> vxlan_gro_receive pulls up vxlan header
> >   into linear part, and then..
> 
> This is the point where we need to stop, drop the existing headers,
> call skb_reserve(NET_IP_ALIGN), and then pick back up where we left
> off.  We just have to make sure the skbuff isn't shared.
  :
> In general the interface behind the tunnel shouldn't need to know
> about any of the data in front of the tunnel, so you should be able to
> drop the original header offsets and reset things if you want so you
> could overwrite the old headers.

Except that some of these encapsulations may be doing things like
[inner-mac, outer-ip] fdb learning, so we may not be able to lose
the outer headers at this early point in all cases.

> If I am not mistaken I think the multi-buffer approach is the approach
> taken by other OSes, although for us it is more difficult since we

yes, I think that's approximately what gets done with mblk/mbufs
(with pullups as needed). But as you point out, this type of
->frag_list chaining is a non-trivial change.

> I'm sure we are all going to be talking about this in great detail
> next week at netdev/netconf.. :-)

Agree.

--Sowmini

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2016-09-28 19:56 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-20 14:27 [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb() Sowmini Varadhan
2016-09-20 15:31 ` Tom Herbert
2016-09-20 15:49   ` Sowmini Varadhan
2016-09-20 16:11 ` Jiri Benc
2016-09-20 16:31   ` Sowmini Varadhan
2016-09-20 16:43     ` Jiri Benc
2016-09-20 17:07       ` Sowmini Varadhan
2016-09-20 17:15         ` Jiri Benc
2016-09-20 17:09       ` Jiri Benc
2016-09-20 17:19         ` Tom Herbert
2016-09-20 17:24         ` Sowmini Varadhan
2016-09-22  5:52         ` David Miller
2016-09-22 21:30           ` Sowmini Varadhan
2016-09-23 12:06             ` David Miller
2016-09-23 14:17               ` Alexander Duyck
2016-09-23 17:20                 ` Sowmini Varadhan
2016-09-23 17:38                   ` Alexander Duyck
2016-09-23 23:41                     ` Sowmini Varadhan
2016-09-24  0:43                       ` Alexander Duyck
2016-09-28 17:03                         ` Sowmini Varadhan
2016-09-28 18:08                           ` Alexander Duyck
2016-09-28 19:56                             ` Sowmini Varadhan
2016-09-20 16:45 ` Hannes Frederic Sowa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.