All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 2/2] IP_GRE: Linearize skb before csum.
@ 2013-01-22 21:20 Pravin B Shelar
  2013-01-22 22:03 ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Pravin B Shelar @ 2013-01-22 21:20 UTC (permalink / raw)
  To: netdev; +Cc: jesse, Pravin B Shelar

Make copy of skb sharable data by linearizing skb. so that
csum remain consistent when skb is actually transmitted.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
 net/ipv4/ip_gre.c |   22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 7cdf1fe..20d3d37 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -738,7 +738,7 @@ drop:
 static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct ip_tunnel *tunnel = netdev_priv(dev);
-	const struct iphdr  *old_iph = ip_hdr(skb);
+	const struct iphdr  *old_iph;
 	const struct iphdr  *tiph;
 	struct flowi4 fl4;
 	u8     tos;
@@ -752,9 +752,23 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev
 	int    mtu;
 	u8     ttl;
 
-	if (skb->ip_summed == CHECKSUM_PARTIAL &&
-	    skb_checksum_help(skb))
-		goto tx_error;
+	if (skb->ip_summed == CHECKSUM_PARTIAL) {
+		int err;
+
+		/* Pages aren't locked and could change at any time.
+		 * If this happens after we compute the checksum, the
+		 * checksum will be wrong.  We linearize now to avoid
+		 * this problem.
+		 */
+		err = __skb_linearize(skb);
+		if (unlikely(err))
+			goto tx_error;
+
+		err = skb_checksum_help(skb);
+		if (unlikely(err))
+			goto tx_error;
+	}
+	old_iph = ip_hdr(skb);
 
 	if (dev->type == ARPHRD_ETHER)
 		IPCB(skb)->flags = 0;
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] IP_GRE: Linearize skb before csum.
  2013-01-22 21:20 [PATCH 2/2] IP_GRE: Linearize skb before csum Pravin B Shelar
@ 2013-01-22 22:03 ` Eric Dumazet
  2013-01-22 22:08   ` David Miller
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2013-01-22 22:03 UTC (permalink / raw)
  To: Pravin B Shelar; +Cc: netdev, jesse

On Tue, 2013-01-22 at 13:20 -0800, Pravin B Shelar wrote:
> Make copy of skb sharable data by linearizing skb. so that
> csum remain consistent when skb is actually transmitted.
> 
> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
> ---
>  net/ipv4/ip_gre.c |   22 ++++++++++++++++++----
>  1 file changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> index 7cdf1fe..20d3d37 100644
> --- a/net/ipv4/ip_gre.c
> +++ b/net/ipv4/ip_gre.c
> @@ -738,7 +738,7 @@ drop:
>  static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
>  {
>  	struct ip_tunnel *tunnel = netdev_priv(dev);
> -	const struct iphdr  *old_iph = ip_hdr(skb);
> +	const struct iphdr  *old_iph;
>  	const struct iphdr  *tiph;
>  	struct flowi4 fl4;
>  	u8     tos;
> @@ -752,9 +752,23 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev
>  	int    mtu;
>  	u8     ttl;
>  
> -	if (skb->ip_summed == CHECKSUM_PARTIAL &&
> -	    skb_checksum_help(skb))
> -		goto tx_error;
> +	if (skb->ip_summed == CHECKSUM_PARTIAL) {
> +		int err;
> +
> +		/* Pages aren't locked and could change at any time.
> +		 * If this happens after we compute the checksum, the
> +		 * checksum will be wrong.  We linearize now to avoid
> +		 * this problem.
> +		 */
> +		err = __skb_linearize(skb);
> +		if (unlikely(err))
> +			goto tx_error;
> +
> +		err = skb_checksum_help(skb);
> +		if (unlikely(err))
> +			goto tx_error;
> +	}
> +	old_iph = ip_hdr(skb);
>  
>  	if (dev->type == ARPHRD_ETHER)
>  		IPCB(skb)->flags = 0;

This sounds bogus to me.

If user cant cope with changes on sent data, just disable GSO on GRE
device.

An application changing data provided on a sendfile() or vmsplice() cant
really expect data integrity being respected, even if checksum are done
by the NIC (TSO)

So if data integrity is not respected, just send a bogus TX checksum.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] IP_GRE: Linearize skb before csum.
  2013-01-22 22:03 ` Eric Dumazet
@ 2013-01-22 22:08   ` David Miller
  2013-01-22 22:15     ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: David Miller @ 2013-01-22 22:08 UTC (permalink / raw)
  To: eric.dumazet; +Cc: pshelar, netdev, jesse

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 22 Jan 2013 14:03:19 -0800

> An application changing data provided on a sendfile() or vmsplice() cant
> really expect data integrity being respected, even if checksum are done
> by the NIC (TSO)
> 
> So if data integrity is not respected, just send a bogus TX checksum.

I disagree.

As a quality of implementation decision, we should always compute
correct checksums, ragardless of whether the page contents can be
modified asynchronously during the packet transmit.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] IP_GRE: Linearize skb before csum.
  2013-01-22 22:08   ` David Miller
@ 2013-01-22 22:15     ` Eric Dumazet
  2013-01-22 22:30       ` Pravin Shelar
  2013-01-22 22:38       ` Jesse Gross
  0 siblings, 2 replies; 15+ messages in thread
From: Eric Dumazet @ 2013-01-22 22:15 UTC (permalink / raw)
  To: David Miller; +Cc: pshelar, netdev, jesse

On Tue, 2013-01-22 at 17:08 -0500, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Tue, 22 Jan 2013 14:03:19 -0800
> 
> > An application changing data provided on a sendfile() or vmsplice() cant
> > really expect data integrity being respected, even if checksum are done
> > by the NIC (TSO)
> > 
> > So if data integrity is not respected, just send a bogus TX checksum.
> 
> I disagree.
> 
> As a quality of implementation decision, we should always compute
> correct checksums, ragardless of whether the page contents can be
> modified asynchronously during the packet transmit.

The retransmits will compute the correct checksums.

For a very unlikely operation from the user, we want to disable GSO on
GRE.

If so, just revert my GSO patch.

This is plain stupid to cook GSO packets and then linearize them, adding
an extra copy and a point of failure, as allocating a 128K skb->head
will probably fail very easily.

Now, tcp_sendmsg() cooked skb are fine, and this patch assumes they are
not.

This patch is absolutely wrong.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] IP_GRE: Linearize skb before csum.
  2013-01-22 22:15     ` Eric Dumazet
@ 2013-01-22 22:30       ` Pravin Shelar
  2013-01-22 22:38       ` Jesse Gross
  1 sibling, 0 replies; 15+ messages in thread
From: Pravin Shelar @ 2013-01-22 22:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Jesse Gross

On Tue, Jan 22, 2013 at 2:15 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2013-01-22 at 17:08 -0500, David Miller wrote:
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Tue, 22 Jan 2013 14:03:19 -0800
>>
>> > An application changing data provided on a sendfile() or vmsplice() cant
>> > really expect data integrity being respected, even if checksum are done
>> > by the NIC (TSO)
>> >
>> > So if data integrity is not respected, just send a bogus TX checksum.
>>
>> I disagree.
>>
>> As a quality of implementation decision, we should always compute
>> correct checksums, ragardless of whether the page contents can be
>> modified asynchronously during the packet transmit.
>
> The retransmits will compute the correct checksums.
>
> For a very unlikely operation from the user, we want to disable GSO on
> GRE.
>
> If so, just revert my GSO patch.
>
> This is plain stupid to cook GSO packets and then linearize them, adding
> an extra copy and a point of failure, as allocating a 128K skb->head
> will probably fail very easily.
>
OK.
This overhead can be avoided with GRE segmentation offload. I will
send that series soon.

> Now, tcp_sendmsg() cooked skb are fine, and this patch assumes they are
> not.
>
> This patch is absolutely wrong.
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] IP_GRE: Linearize skb before csum.
  2013-01-22 22:15     ` Eric Dumazet
  2013-01-22 22:30       ` Pravin Shelar
@ 2013-01-22 22:38       ` Jesse Gross
  2013-01-22 22:44         ` Eric Dumazet
  1 sibling, 1 reply; 15+ messages in thread
From: Jesse Gross @ 2013-01-22 22:38 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, pshelar, netdev

On Tue, Jan 22, 2013 at 2:15 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2013-01-22 at 17:08 -0500, David Miller wrote:
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Tue, 22 Jan 2013 14:03:19 -0800
>>
>> > An application changing data provided on a sendfile() or vmsplice() cant
>> > really expect data integrity being respected, even if checksum are done
>> > by the NIC (TSO)
>> >
>> > So if data integrity is not respected, just send a bogus TX checksum.
>>
>> I disagree.
>>
>> As a quality of implementation decision, we should always compute
>> correct checksums, ragardless of whether the page contents can be
>> modified asynchronously during the packet transmit.
>
> The retransmits will compute the correct checksums.
>
> For a very unlikely operation from the user, we want to disable GSO on
> GRE.

We're currently enforcing this assumption in the rest of the network
stack - it's why we mask out scatter/gather capability in the NIC if
it isn't capable of checksumming the packet.

Packets with asynchronous changes may come from VMs, so it isn't
necessarily reasonable to tell people that they need to disable
offloads with certain use cases.

As Pravin said, pushing down the GSO to the lowest layer is the best
way to solve the problem.  However, I would argue that the current
behavior is not correct.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] IP_GRE: Linearize skb before csum.
  2013-01-22 22:38       ` Jesse Gross
@ 2013-01-22 22:44         ` Eric Dumazet
  2013-01-23  0:41           ` David Miller
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2013-01-22 22:44 UTC (permalink / raw)
  To: Jesse Gross; +Cc: David Miller, pshelar, netdev

On Tue, 2013-01-22 at 14:38 -0800, Jesse Gross wrote:

> 
> We're currently enforcing this assumption in the rest of the network
> stack - it's why we mask out scatter/gather capability in the NIC if
> it isn't capable of checksumming the packet.
> 
> Packets with asynchronous changes may come from VMs, so it isn't
> necessarily reasonable to tell people that they need to disable
> offloads with certain use cases.
> 
> As Pravin said, pushing down the GSO to the lowest layer is the best
> way to solve the problem.  However, I would argue that the current
> behavior is not correct.

You do understand this problem is generic to GSO ?

You basically are saying GSO should be removed.

If you really care, please find another way to address the problem.

Frames build by tcp_sendmsg() are fine : their content cannot be changed
by the user.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] IP_GRE: Linearize skb before csum.
  2013-01-22 22:44         ` Eric Dumazet
@ 2013-01-23  0:41           ` David Miller
  2013-01-23  1:08             ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: David Miller @ 2013-01-23  0:41 UTC (permalink / raw)
  To: eric.dumazet; +Cc: jesse, pshelar, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 22 Jan 2013 14:44:13 -0800

> On Tue, 2013-01-22 at 14:38 -0800, Jesse Gross wrote:
> 
>> 
>> We're currently enforcing this assumption in the rest of the network
>> stack - it's why we mask out scatter/gather capability in the NIC if
>> it isn't capable of checksumming the packet.
>> 
>> Packets with asynchronous changes may come from VMs, so it isn't
>> necessarily reasonable to tell people that they need to disable
>> offloads with certain use cases.
>> 
>> As Pravin said, pushing down the GSO to the lowest layer is the best
>> way to solve the problem.  However, I would argue that the current
>> behavior is not correct.
> 
> You do understand this problem is generic to GSO ?
> 
> You basically are saying GSO should be removed.
> 
> If you really care, please find another way to address the problem.
> 
> Frames build by tcp_sendmsg() are fine : their content cannot be changed
> by the user.

We don't emit crap onto the wire knowingly.  Jesse is right.

If you want to do software GSO in situations where we know that the
paged data cannot be modified asynchronously, you'll have to
explicitly support that.

I will not accept us saying that allowing the emission of bad
checksums is OK.  It never is.  That's terrible behavior, and creates
impossible to disagnose problems.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] IP_GRE: Linearize skb before csum.
  2013-01-23  0:41           ` David Miller
@ 2013-01-23  1:08             ` Eric Dumazet
  2013-01-23  1:46               ` David Miller
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2013-01-23  1:08 UTC (permalink / raw)
  To: David Miller; +Cc: jesse, pshelar, netdev

On Tue, 2013-01-22 at 19:41 -0500, David Miller wrote:

> We don't emit crap onto the wire knowingly.  Jesse is right.
> 
> If you want to do software GSO in situations where we know that the
> paged data cannot be modified asynchronously, you'll have to
> explicitly support that.
> 
> I will not accept us saying that allowing the emission of bad
> checksums is OK.  It never is.  That's terrible behavior, and creates
> impossible to disagnose problems.

How is this related to GRE only ?

We have dozen of skb_checksum_help() calls that basically have the same
issue.

Am I missing some obvious thing ?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] IP_GRE: Linearize skb before csum.
  2013-01-23  1:08             ` Eric Dumazet
@ 2013-01-23  1:46               ` David Miller
  2013-01-23  4:22                 ` David Miller
  0 siblings, 1 reply; 15+ messages in thread
From: David Miller @ 2013-01-23  1:46 UTC (permalink / raw)
  To: eric.dumazet; +Cc: jesse, pshelar, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 22 Jan 2013 17:08:13 -0800

> On Tue, 2013-01-22 at 19:41 -0500, David Miller wrote:
> 
>> We don't emit crap onto the wire knowingly.  Jesse is right.
>> 
>> If you want to do software GSO in situations where we know that the
>> paged data cannot be modified asynchronously, you'll have to
>> explicitly support that.
>> 
>> I will not accept us saying that allowing the emission of bad
>> checksums is OK.  It never is.  That's terrible behavior, and creates
>> impossible to disagnose problems.
> 
> How is this related to GRE only ?
> 
> We have dozen of skb_checksum_help() calls that basically have the same
> issue.
> 
> Am I missing some obvious thing ?

Then all of them need fixing.

We can't emit bogus checksums on the wire when this is completely
under our control.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] IP_GRE: Linearize skb before csum.
  2013-01-23  1:46               ` David Miller
@ 2013-01-23  4:22                 ` David Miller
  2013-01-23  5:02                   ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: David Miller @ 2013-01-23  4:22 UTC (permalink / raw)
  To: eric.dumazet; +Cc: jesse, pshelar, netdev

From: David Miller <davem@davemloft.net>
Date: Tue, 22 Jan 2013 20:46:29 -0500 (EST)

> We can't emit bogus checksums on the wire when this is completely
> under our control.

BTW consider a legitimate use case where an application has a shared
memory page between threads that keeps track of statistic counters,
and it uses sendfile() over an fd backing that mmap()'d area to
send the statistics out remotely over TCP.

That's completely legitimate.

As is an application constantly overwriting a page in the filesystem
page cache, and another app doing sendfile() over TCP from that
area.

In such a situation everyone of your magic retransmits will have a
bad checksum too.

You can't say that this behavior is anything other than broken, and
I don't want to be responsible for a system that's doing broken
things like that.

If your GSO over GRE tunnels stuff fundamentally depends upon this
magic retransmit fantasy working, then yes we have to disable it
completely.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] IP_GRE: Linearize skb before csum.
  2013-01-23  4:22                 ` David Miller
@ 2013-01-23  5:02                   ` Eric Dumazet
  2013-01-23  5:19                     ` David Miller
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2013-01-23  5:02 UTC (permalink / raw)
  To: David Miller; +Cc: jesse, pshelar, netdev

On Tue, 2013-01-22 at 23:22 -0500, David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Tue, 22 Jan 2013 20:46:29 -0500 (EST)
> 
> > We can't emit bogus checksums on the wire when this is completely
> > under our control.
> 
> BTW consider a legitimate use case where an application has a shared
> memory page between threads that keeps track of statistic counters,
> and it uses sendfile() over an fd backing that mmap()'d area to
> send the statistics out remotely over TCP.
> 
> That's completely legitimate.
> 
> As is an application constantly overwriting a page in the filesystem
> page cache, and another app doing sendfile() over TCP from that
> area.
> 
> In such a situation everyone of your magic retransmits will have a
> bad checksum too.
> 
> You can't say that this behavior is anything other than broken, and
> I don't want to be responsible for a system that's doing broken
> things like that.
> 
> If your GSO over GRE tunnels stuff fundamentally depends upon this
> magic retransmit fantasy working, then yes we have to disable it
> completely.
> --

But then GSO should be disabled right now, GRE or not.

We have a generic problem.

GSO was a bad idea from the very beginning, if you want
such use case. 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] IP_GRE: Linearize skb before csum.
  2013-01-23  5:02                   ` Eric Dumazet
@ 2013-01-23  5:19                     ` David Miller
  2013-01-23  5:30                       ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: David Miller @ 2013-01-23  5:19 UTC (permalink / raw)
  To: eric.dumazet; +Cc: jesse, pshelar, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 22 Jan 2013 21:02:53 -0800

> GSO was a bad idea from the very beginning, if you want
> such use case. 

It's perfectly fine when the card checksums the packet.  Because it
will checksum the exact data that will hit the wire and thus the
checksum will be correct always.

It's only buggy when we do pure software GSO and have page frags.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] IP_GRE: Linearize skb before csum.
  2013-01-23  5:19                     ` David Miller
@ 2013-01-23  5:30                       ` Eric Dumazet
  2013-01-23  5:36                         ` David Miller
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2013-01-23  5:30 UTC (permalink / raw)
  To: David Miller; +Cc: jesse, pshelar, netdev

On Wed, 2013-01-23 at 00:19 -0500, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Tue, 22 Jan 2013 21:02:53 -0800
> 
> > GSO was a bad idea from the very beginning, if you want
> > such use case. 
> 
> It's perfectly fine when the card checksums the packet.  Because it
> will checksum the exact data that will hit the wire and thus the
> checksum will be correct always.
> 
> It's only buggy when we do pure software GSO and have page frags.

Right.

We could add a new SKB_GSO_SHARED_FRAG bit for that, and force an extra
copy of frags, instead of a linearize call.

Basically, splice(pipe -> socket)/sendfile()/vmsplice() should set this
flag.

Its a bit late here, I'll do that tomorrow.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] IP_GRE: Linearize skb before csum.
  2013-01-23  5:30                       ` Eric Dumazet
@ 2013-01-23  5:36                         ` David Miller
  0 siblings, 0 replies; 15+ messages in thread
From: David Miller @ 2013-01-23  5:36 UTC (permalink / raw)
  To: eric.dumazet; +Cc: jesse, pshelar, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 22 Jan 2013 21:30:50 -0800

> We could add a new SKB_GSO_SHARED_FRAG bit for that, and force an extra
> copy of frags, instead of a linearize call.
> 
> Basically, splice(pipe -> socket)/sendfile()/vmsplice() should set this
> flag.

Right.

> Its a bit late here, I'll do that tomorrow.

Thank you.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2013-01-23  5:36 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-22 21:20 [PATCH 2/2] IP_GRE: Linearize skb before csum Pravin B Shelar
2013-01-22 22:03 ` Eric Dumazet
2013-01-22 22:08   ` David Miller
2013-01-22 22:15     ` Eric Dumazet
2013-01-22 22:30       ` Pravin Shelar
2013-01-22 22:38       ` Jesse Gross
2013-01-22 22:44         ` Eric Dumazet
2013-01-23  0:41           ` David Miller
2013-01-23  1:08             ` Eric Dumazet
2013-01-23  1:46               ` David Miller
2013-01-23  4:22                 ` David Miller
2013-01-23  5:02                   ` Eric Dumazet
2013-01-23  5:19                     ` David Miller
2013-01-23  5:30                       ` Eric Dumazet
2013-01-23  5:36                         ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.