linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* net: fix routing encapsulated packets when binding a socket to a tunnel interface
@ 2019-03-22  6:29 lifonghsu
  2019-03-25 23:16 ` David Miller
  2019-03-27 20:44 ` David Miller
  0 siblings, 2 replies; 8+ messages in thread
From: lifonghsu @ 2019-03-22  6:29 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-kernel, LiFong Hsu

From: LiFong Hsu <lifonghsu@synology.com>

When binding a socket to a 4in6/6in4 tunnel interface, the kernel sends
the packet to the tunnel interface without any problem, e.g.,
ping 8.8.8.8 -I 4in6. However, after the 4in6/6in4 tunnel encapsulation,
the encapsulated packet could be sent to the tunnel interface again when
some fields of the skb were changed in mangle table's output chain,
such as skb->mark and src/dest IP address. Sending to the tunnel interface
twice is unexpected, since there are no corresponding routing rules on
the tunnel interface for the encapsulated packet. Eventually, the encapsulated
packet will be dropped by the tunnel interface.

This commit stops referring to sk_bound_dev_if while re-routing a packet
with skb_iif!=0 which indicates that the packet has already been sent to
the interface specified by sk_bound_dev_if. Instead, this commit sends
the packet to the underlying network device.

Signed-off-by: LiFong Hsu <lifonghsu@synology.com>
Reviewed-by: JianJhen Chen <kchen@synology.com>
---
 net/ipv4/netfilter.c  | 2 +-
 net/ipv6/ip6_tunnel.c | 3 +++
 net/ipv6/netfilter.c  | 2 +-
 net/ipv6/sit.c        | 4 ++++
 4 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/netfilter.c b/net/ipv4/netfilter.c
index 8d2e5dc9..426c56b 100644
--- a/net/ipv4/netfilter.c
+++ b/net/ipv4/netfilter.c
@@ -41,7 +41,7 @@ int ip_route_me_harder(struct net *net, struct sk_buff *skb, unsigned int addr_t
 	fl4.daddr = iph->daddr;
 	fl4.saddr = saddr;
 	fl4.flowi4_tos = RT_TOS(iph->tos);
-	fl4.flowi4_oif = sk ? sk->sk_bound_dev_if : 0;
+	fl4.flowi4_oif = (sk && !skb->skb_iif) ? sk->sk_bound_dev_if : 0;
 	if (!fl4.flowi4_oif)
 		fl4.flowi4_oif = l3mdev_master_ifindex(dev);
 	fl4.flowi4_mark = skb->mark;
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 0c6403c..6997599 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1220,6 +1220,9 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev, __u8 dsfield,
 	ipv6h->nexthdr = proto;
 	ipv6h->saddr = fl6->saddr;
 	ipv6h->daddr = fl6->daddr;
+
+	/* Reset the skb_iif to Tunnels interface index */
+	skb->skb_iif = dev->ifindex;
 	ip6tunnel_xmit(NULL, skb, dev);
 	return 0;
 tx_err_link_failure:
diff --git a/net/ipv6/netfilter.c b/net/ipv6/netfilter.c
index 6d0b1f3..55cc431 100644
--- a/net/ipv6/netfilter.c
+++ b/net/ipv6/netfilter.c
@@ -26,7 +26,7 @@ int ip6_route_me_harder(struct net *net, struct sk_buff *skb)
 	int strict = (ipv6_addr_type(&iph->daddr) &
 		      (IPV6_ADDR_MULTICAST | IPV6_ADDR_LINKLOCAL));
 	struct flowi6 fl6 = {
-		.flowi6_oif = sk && sk->sk_bound_dev_if ? sk->sk_bound_dev_if :
+		.flowi6_oif = (sk && sk->sk_bound_dev_if && !skb->skb_iif) ? sk->sk_bound_dev_if :
 			strict ? skb_dst(skb)->dev->ifindex : 0,
 		.flowi6_mark = skb->mark,
 		.flowi6_uid = sock_net_uid(net, sk),
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index e8a1dab..1112ef2 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -988,6 +988,8 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
 
 	skb_set_inner_ipproto(skb, IPPROTO_IPV6);
 
+	/* Reset the skb_iif to Tunnels interface index */
+	skb->skb_iif = dev->ifindex;
 	iptunnel_xmit(NULL, rt, skb, fl4.saddr, fl4.daddr, protocol, tos, ttl,
 		      df, !net_eq(tunnel->net, dev_net(dev)));
 	return NETDEV_TX_OK;
@@ -1011,6 +1013,8 @@ static netdev_tx_t sit_tunnel_xmit__(struct sk_buff *skb,
 
 	skb_set_inner_ipproto(skb, ipproto);
 
+	/* Reset the skb_iif to Tunnels interface index */
+	skb->skb_iif = dev->ifindex;
 	ip_tunnel_xmit(skb, dev, tiph, ipproto);
 	return NETDEV_TX_OK;
 tx_error:
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: net: fix routing encapsulated packets when binding a socket to a tunnel interface
  2019-03-22  6:29 net: fix routing encapsulated packets when binding a socket to a tunnel interface lifonghsu
@ 2019-03-25 23:16 ` David Miller
  2019-03-27 20:44 ` David Miller
  1 sibling, 0 replies; 8+ messages in thread
From: David Miller @ 2019-03-25 23:16 UTC (permalink / raw)
  To: lifonghsu; +Cc: netdev, linux-kernel

From: lifonghsu <lifonghsu@synology.com>
Date: Fri, 22 Mar 2019 14:29:58 +0800

> From: LiFong Hsu <lifonghsu@synology.com>
> 
> When binding a socket to a 4in6/6in4 tunnel interface, the kernel sends
> the packet to the tunnel interface without any problem, e.g.,
> ping 8.8.8.8 -I 4in6. However, after the 4in6/6in4 tunnel encapsulation,
> the encapsulated packet could be sent to the tunnel interface again when
> some fields of the skb were changed in mangle table's output chain,
> such as skb->mark and src/dest IP address. Sending to the tunnel interface
> twice is unexpected, since there are no corresponding routing rules on
> the tunnel interface for the encapsulated packet. Eventually, the encapsulated
> packet will be dropped by the tunnel interface.
> 
> This commit stops referring to sk_bound_dev_if while re-routing a packet
> with skb_iif!=0 which indicates that the packet has already been sent to
> the interface specified by sk_bound_dev_if. Instead, this commit sends
> the packet to the underlying network device.
> 
> Signed-off-by: LiFong Hsu <lifonghsu@synology.com>
> Reviewed-by: JianJhen Chen <kchen@synology.com>

skb->skb_iif is a receive side indication, why would it be changed on
transmit?

I see mac802154 doing this, but what it's doing is somewhat broken and
that doesn't come into play in your example.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: net: fix routing encapsulated packets when binding a socket to a tunnel interface
  2019-03-22  6:29 net: fix routing encapsulated packets when binding a socket to a tunnel interface lifonghsu
  2019-03-25 23:16 ` David Miller
@ 2019-03-27 20:44 ` David Miller
  2019-03-28  8:30   ` lifonghsu
  1 sibling, 1 reply; 8+ messages in thread
From: David Miller @ 2019-03-27 20:44 UTC (permalink / raw)
  To: lifonghsu; +Cc: netdev, linux-kernel

From: lifonghsu <lifonghsu@synology.com>
Date: Fri, 22 Mar 2019 14:29:58 +0800

> From: LiFong Hsu <lifonghsu@synology.com>
> 
> When binding a socket to a 4in6/6in4 tunnel interface, the kernel sends
> the packet to the tunnel interface without any problem, e.g.,
> ping 8.8.8.8 -I 4in6. However, after the 4in6/6in4 tunnel encapsulation,
> the encapsulated packet could be sent to the tunnel interface again when
> some fields of the skb were changed in mangle table's output chain,
> such as skb->mark and src/dest IP address. Sending to the tunnel interface
> twice is unexpected, since there are no corresponding routing rules on
> the tunnel interface for the encapsulated packet. Eventually, the encapsulated
> packet will be dropped by the tunnel interface.
> 
> This commit stops referring to sk_bound_dev_if while re-routing a packet
> with skb_iif!=0 which indicates that the packet has already been sent to
> the interface specified by sk_bound_dev_if. Instead, this commit sends
> the packet to the underlying network device.
> 
> Signed-off-by: LiFong Hsu <lifonghsu@synology.com>
> Reviewed-by: JianJhen Chen <kchen@synology.com>

If you do not respond to my feedback, I am going to drop your patch.

You have 24 hours to respond before that happens.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: net: fix routing encapsulated packets when binding a socket to a tunnel interface
  2019-03-27 20:44 ` David Miller
@ 2019-03-28  8:30   ` lifonghsu
  2019-03-29 17:40     ` David Miller
  0 siblings, 1 reply; 8+ messages in thread
From: lifonghsu @ 2019-03-28  8:30 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel

David Miller 於 2019-03-28 04:44 寫到:
> From: lifonghsu <lifonghsu@synology.com>
> Date: Fri, 22 Mar 2019 14:29:58 +0800
> 
>> From: LiFong Hsu <lifonghsu@synology.com>
>> 
>> When binding a socket to a 4in6/6in4 tunnel interface, the kernel 
>> sends
>> the packet to the tunnel interface without any problem, e.g.,
>> ping 8.8.8.8 -I 4in6. However, after the 4in6/6in4 tunnel 
>> encapsulation,
>> the encapsulated packet could be sent to the tunnel interface again 
>> when
>> some fields of the skb were changed in mangle table's output chain,
>> such as skb->mark and src/dest IP address. Sending to the tunnel 
>> interface
>> twice is unexpected, since there are no corresponding routing rules on
>> the tunnel interface for the encapsulated packet. Eventually, the 
>> encapsulated
>> packet will be dropped by the tunnel interface.
>> 
>> This commit stops referring to sk_bound_dev_if while re-routing a 
>> packet
>> with skb_iif!=0 which indicates that the packet has already been sent 
>> to
>> the interface specified by sk_bound_dev_if. Instead, this commit sends
>> the packet to the underlying network device.
>> 
>> Signed-off-by: LiFong Hsu <lifonghsu@synology.com>
>> Reviewed-by: JianJhen Chen <kchen@synology.com>
> 
> If you do not respond to my feedback, I am going to drop your patch.
> 
> You have 24 hours to respond before that happens.

Sorry for the late reply.

> skb->skb_iif is a receive side indication, why would it be changed on 
> transmit?
Indeed, skb_iif is used as receive site indication to present "device 
the packet arrived on".
This commit keeps the previous arrived device (similar to the concept of 
"device the packet arrived on") in skb_iif field to prevent kernel from 
referring sk_bound_dev_if again. Otherwise, we might need to add a new 
field to sk_buff structure for our purpose.

An example of how this commit works with 4in6 tunnel device is listed as 
follows:
socket bind 4in6 device (e.g. ping -I 4in6 8.8.8.8)
-> skb is transmitted to 4in6 device by referring sk_bound_dev_if
-> encapsulate the IPv4 packet to an IPv6 packet
-> "this commit sets skb_iif to 4in6 device as a flag"
-> ip6tunnel_xmit
-> re-route the encapsulated IPv6 packet
-> "this commit prevents kernel from referring sk_bound_dev_if again by 
checking skb_iif, so the encapsulated IPv6 packet will be sent to the 
underlying device instead of the 4in6 device"
-> underlying device

> I see mac802154 doing this, but what it's doing is somewhat broken and 
> that doesn't come into play in your example.

We have not tested mac802154, but we think this commit should be 
compatible with mac802154. Considering mac802154 is an underlying device 
in the above mentioned example. The skb_iif will be overwritten by 
mac802154 at the end. So the behavior of mac802154 should not be 
affected by this commit.

Thanks.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: net: fix routing encapsulated packets when binding a socket to a tunnel interface
  2019-03-28  8:30   ` lifonghsu
@ 2019-03-29 17:40     ` David Miller
  2019-04-01  3:00       ` lifonghsu
  0 siblings, 1 reply; 8+ messages in thread
From: David Miller @ 2019-03-29 17:40 UTC (permalink / raw)
  To: lifonghsu; +Cc: netdev, linux-kernel

From: lifonghsu <lifonghsu@synology.com>
Date: Thu, 28 Mar 2019 16:30:37 +0800

> Indeed, skb_iif is used as receive site indication to present "device
> the packet arrived on".
> This commit keeps the previous arrived device (similar to the concept
> of "device the packet arrived on") in skb_iif field to prevent kernel
> from referring sk_bound_dev_if again. Otherwise, we might need to add
> a new field to sk_buff structure for our purpose.

Therefore, you are deciding to arbitrarily repurpose an RX side piece
of state for TX purposes.

Do not do this.

It confuses anyone trying to understand how skb_iif works.

You must use something with a different name, and clear semantics, to
achieve this goal.

For example, you could use an anonymous union:

	union {
		int	skb_iif;
		bool	bound_dev_already_applied;
	};

You never actually _USE_ the value of skb_iif, it is just merely a
boolean indicating whether sk_bound_dev_if was applied already.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: net: fix routing encapsulated packets when binding a socket to a tunnel interface
  2019-03-29 17:40     ` David Miller
@ 2019-04-01  3:00       ` lifonghsu
  2019-04-01  6:01         ` David Miller
  0 siblings, 1 reply; 8+ messages in thread
From: lifonghsu @ 2019-04-01  3:00 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel

David Miller 於 2019-03-30 01:40 寫到:
> From: lifonghsu <lifonghsu@synology.com>
> Date: Thu, 28 Mar 2019 16:30:37 +0800
> 
>> Indeed, skb_iif is used as receive site indication to present "device
>> the packet arrived on".
>> This commit keeps the previous arrived device (similar to the concept
>> of "device the packet arrived on") in skb_iif field to prevent kernel
>> from referring sk_bound_dev_if again. Otherwise, we might need to add
>> a new field to sk_buff structure for our purpose.
> 
> Therefore, you are deciding to arbitrarily repurpose an RX side piece
> of state for TX purposes.
> 
> Do not do this.
> 
> It confuses anyone trying to understand how skb_iif works.
> 
> You must use something with a different name, and clear semantics, to
> achieve this goal.
> 
> For example, you could use an anonymous union:
> 
> 	union {
> 		int	skb_iif;
> 		bool	bound_dev_already_applied;
> 	};
> 
> You never actually _USE_ the value of skb_iif, it is just merely a
> boolean indicating whether sk_bound_dev_if was applied already.

Since we are not sure whether there is any code referring to the value 
of
skb_iff in transmit site or not, it is risky to set an invalid interface
index to skb_iif in an anonymous union.
What if we add a bit (e.g., __u8 bound_dev_already_applied:1;) to 
sk_buff?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: net: fix routing encapsulated packets when binding a socket to a tunnel interface
  2019-04-01  3:00       ` lifonghsu
@ 2019-04-01  6:01         ` David Miller
  2019-04-01  9:52           ` lifonghsu
  0 siblings, 1 reply; 8+ messages in thread
From: David Miller @ 2019-04-01  6:01 UTC (permalink / raw)
  To: lifonghsu; +Cc: netdev, linux-kernel

From: lifonghsu <lifonghsu@synology.com>
Date: Mon, 01 Apr 2019 11:00:22 +0800

> Since we are not sure whether there is any code referring to the
> value of skb_iff in transmit site or not, it is risky to set an
> invalid interface

If you are no confident in how the value is used on TX, then I can't
consider your change with confidence either.

Think about what you are saying to me here.  You basically do not know
whether you are adding a bug or not.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: net: fix routing encapsulated packets when binding a socket to a tunnel interface
  2019-04-01  6:01         ` David Miller
@ 2019-04-01  9:52           ` lifonghsu
  0 siblings, 0 replies; 8+ messages in thread
From: lifonghsu @ 2019-04-01  9:52 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel

David Miller 於 2019-04-01 14:01 寫到:
> From: lifonghsu <lifonghsu@synology.com>
> Date: Mon, 01 Apr 2019 11:00:22 +0800
> 
>> Since we are not sure whether there is any code referring to the
>> value of skb_iff in transmit site or not, it is risky to set an
>> invalid interface
> 
> If you are no confident in how the value is used on TX, then I can't
> consider your change with confidence either.
> 
> Think about what you are saying to me here.  You basically do not know
> whether you are adding a bug or not.

Indeed, we are not confident about the commit. Please drop it for now.
Thank you for help review the commit.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-04-01  9:52 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-22  6:29 net: fix routing encapsulated packets when binding a socket to a tunnel interface lifonghsu
2019-03-25 23:16 ` David Miller
2019-03-27 20:44 ` David Miller
2019-03-28  8:30   ` lifonghsu
2019-03-29 17:40     ` David Miller
2019-04-01  3:00       ` lifonghsu
2019-04-01  6:01         ` David Miller
2019-04-01  9:52           ` lifonghsu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).