All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv2 RFC net-next 0/7] net: dst_confirm replacement
@ 2017-01-28 14:26 ` Julian Anastasov
  0 siblings, 0 replies; 24+ messages in thread
From: Julian Anastasov @ 2017-01-28 14:26 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, YueHaibing

v1->v2:
- patch 1:
  put sk_dst_pending_confirm in TX cacheline
- patch 2:
  add skb_set_dst_pending_confirm and skb_get_dst_pending_confirm helpers
- patch 3:
  use skb_set_dst_pending_confirm
  remove check from sctp_transport_dst_confirm, directly assign
- patch 4:
  use skb_set_dst_pending_confirm
- patch 6:
  use skb_set_dst_pending_confirm

	This patchset addresses the problem of neighbour
confirmation where received replies from one nexthop
can cause confirmation of different nexthop when using
the same dst. Thanks to YueHaibing <yuehaibing@huawei.com>
for tracking the dst->pending_confirm problem.

	Sockets can obtain cached output route. Such
routes can be to known nexthop (rt_gateway=IP) or to be
used simultaneously for different nexthop IPs by different
subnet prefixes (nh->nh_scope = RT_SCOPE_HOST, rt_gateway=0).

	At first look, there are more problems:

- dst_confirm() sets flag on dst and not on dst->path,
as result, indication is lost when XFRM is used

- DNAT can change the nexthop, so the really used nexthop is
not confirmed

	So, the following solution is to avoid using
dst->pending_confirm.

	The current dst_confirm() usage is as follows:

Protocols confirming dst on received packets:
- TCP (1 dst per socket)
- SCTP (1 dst per transport)
- CXGB*

Protocols supporting sendmsg with MSG_CONFIRM [ | MSG_PROBE ] to
confirm neighbour:
- UDP IPv4/IPv6
- ICMPv4 PING
- RAW IPv4/IPv6
- L2TP/IPv6

MSG_CONFIRM for other purposes (fix not needed):
- CAN

Sending without locking the socket:
- UDP (when no cork)
- RAW (when hdrincl=1)

Redirects from old to new GW:
- rt6_do_redirect


	The patchset includes the following changes:

1. sock: add sk_dst_pending_confirm flag

- used only by TCP with patch 4 to remember the received
indication in sk->sk_dst_pending_confirm

2. net: add dst_pending_confirm flag to skbuff

- skb->dst_pending_confirm will be used by all protocols
in following patches, via skb_{set,get}_dst_pending_confirm

3. sctp: add dst_pending_confirm flag

- SCTP uses per-transport dsts and can not use
sk->sk_dst_pending_confirm like TCP

4. tcp: replace dst_confirm with sk_dst_confirm

5. net: add confirm_neigh method to dst_ops

- IPv4 and IPv6 provision for slow neigh lookups for MSG_PROBE users.
I decided to use neigh lookup only for this case because on
MSG_PROBE the skb may pass MTU checks but it does not reach
the neigh confirmation code. This patch will be used from patch 6.

- xfrm_confirm_neigh: support is incomplete here, only routes with
known nexthops (gateway) are supported because the tunnel address
is slow to obtain. Or there is solution to this problem?

6. net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP

- dst_confirm conversion for UDP, RAW, ICMP and L2TP/IPv6

- these protocols use MSG_CONFIRM propagated by ip*_append_data
to skb->dst_pending_confirm. sk->sk_dst_pending_confirm is not
used because some sending paths do not lock the socket. For
MSG_PROBE we use the slow lookup (dst_confirm_neigh).

- there are also 2 cases that need the slow lookup:
__ip6_rt_update_pmtu and rt6_do_redirect. I hope
&ipv6_hdr(skb)->saddr is the correct nexthop address to use here.

7. net: pending_confirm is not used anymore

- I failed to understand the CXGB* code, I see dst_confirm()
calls but I'm not sure dst_neigh_output() was called. For now
I just removed the dst->pending_confirm flag and left all
dst_confirm() calls there. Any better idea?

- Now may be old function neigh_output() should be restored
instead of dst_neigh_output?


Julian Anastasov (7):
  sock: add sk_dst_pending_confirm flag
  net: add dst_pending_confirm flag to skbuff
  sctp: add dst_pending_confirm flag
  tcp: replace dst_confirm with sk_dst_confirm
  net: add confirm_neigh method to dst_ops
  net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP
  net: pending_confirm is not used anymore

 drivers/net/vrf.c          |  5 ++++-
 include/linux/skbuff.h     | 12 ++++++++++++
 include/net/arp.h          | 16 ++++++++++++++++
 include/net/dst.h          | 21 +++++++++------------
 include/net/dst_ops.h      |  2 ++
 include/net/ndisc.h        | 17 +++++++++++++++++
 include/net/sctp/sctp.h    |  6 ++----
 include/net/sctp/structs.h |  4 ++++
 include/net/sock.h         | 26 ++++++++++++++++++++++++++
 net/core/dst.c             |  1 -
 net/core/sock.c            |  2 ++
 net/ipv4/ip_output.c       | 11 ++++++++++-
 net/ipv4/ping.c            |  3 ++-
 net/ipv4/raw.c             |  6 +++++-
 net/ipv4/route.c           | 19 +++++++++++++++++++
 net/ipv4/tcp_input.c       | 12 +++---------
 net/ipv4/tcp_metrics.c     |  7 ++-----
 net/ipv4/tcp_output.c      |  2 ++
 net/ipv4/udp.c             |  3 ++-
 net/ipv6/ip6_output.c      |  7 +++++++
 net/ipv6/raw.c             |  6 +++++-
 net/ipv6/route.c           | 43 ++++++++++++++++++++++++++++++-------------
 net/ipv6/udp.c             |  3 ++-
 net/l2tp/l2tp_ip6.c        |  3 ++-
 net/sctp/associola.c       |  3 +--
 net/sctp/output.c          | 10 +++++++++-
 net/sctp/outqueue.c        |  2 +-
 net/sctp/sm_make_chunk.c   |  6 ++----
 net/sctp/sm_sideeffect.c   |  2 +-
 net/sctp/socket.c          |  4 ++--
 net/sctp/transport.c       | 17 ++++++++++++++++-
 net/xfrm/xfrm_policy.c     | 16 ++++++++++++++++
 32 files changed, 233 insertions(+), 64 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCHv2 RFC net-next 0/7] net: dst_confirm replacement
@ 2017-01-28 14:26 ` Julian Anastasov
  0 siblings, 0 replies; 24+ messages in thread
From: Julian Anastasov @ 2017-01-28 14:26 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, YueHaibing

v1->v2:
- patch 1:
  put sk_dst_pending_confirm in TX cacheline
- patch 2:
  add skb_set_dst_pending_confirm and skb_get_dst_pending_confirm helpers
- patch 3:
  use skb_set_dst_pending_confirm
  remove check from sctp_transport_dst_confirm, directly assign
- patch 4:
  use skb_set_dst_pending_confirm
- patch 6:
  use skb_set_dst_pending_confirm

	This patchset addresses the problem of neighbour
confirmation where received replies from one nexthop
can cause confirmation of different nexthop when using
the same dst. Thanks to YueHaibing <yuehaibing@huawei.com>
for tracking the dst->pending_confirm problem.

	Sockets can obtain cached output route. Such
routes can be to known nexthop (rt_gateway=IP) or to be
used simultaneously for different nexthop IPs by different
subnet prefixes (nh->nh_scope = RT_SCOPE_HOST, rt_gateway=0).

	At first look, there are more problems:

- dst_confirm() sets flag on dst and not on dst->path,
as result, indication is lost when XFRM is used

- DNAT can change the nexthop, so the really used nexthop is
not confirmed

	So, the following solution is to avoid using
dst->pending_confirm.

	The current dst_confirm() usage is as follows:

Protocols confirming dst on received packets:
- TCP (1 dst per socket)
- SCTP (1 dst per transport)
- CXGB*

Protocols supporting sendmsg with MSG_CONFIRM [ | MSG_PROBE ] to
confirm neighbour:
- UDP IPv4/IPv6
- ICMPv4 PING
- RAW IPv4/IPv6
- L2TP/IPv6

MSG_CONFIRM for other purposes (fix not needed):
- CAN

Sending without locking the socket:
- UDP (when no cork)
- RAW (when hdrincl=1)

Redirects from old to new GW:
- rt6_do_redirect


	The patchset includes the following changes:

1. sock: add sk_dst_pending_confirm flag

- used only by TCP with patch 4 to remember the received
indication in sk->sk_dst_pending_confirm

2. net: add dst_pending_confirm flag to skbuff

- skb->dst_pending_confirm will be used by all protocols
in following patches, via skb_{set,get}_dst_pending_confirm

3. sctp: add dst_pending_confirm flag

- SCTP uses per-transport dsts and can not use
sk->sk_dst_pending_confirm like TCP

4. tcp: replace dst_confirm with sk_dst_confirm

5. net: add confirm_neigh method to dst_ops

- IPv4 and IPv6 provision for slow neigh lookups for MSG_PROBE users.
I decided to use neigh lookup only for this case because on
MSG_PROBE the skb may pass MTU checks but it does not reach
the neigh confirmation code. This patch will be used from patch 6.

- xfrm_confirm_neigh: support is incomplete here, only routes with
known nexthops (gateway) are supported because the tunnel address
is slow to obtain. Or there is solution to this problem?

6. net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP

- dst_confirm conversion for UDP, RAW, ICMP and L2TP/IPv6

- these protocols use MSG_CONFIRM propagated by ip*_append_data
to skb->dst_pending_confirm. sk->sk_dst_pending_confirm is not
used because some sending paths do not lock the socket. For
MSG_PROBE we use the slow lookup (dst_confirm_neigh).

- there are also 2 cases that need the slow lookup:
__ip6_rt_update_pmtu and rt6_do_redirect. I hope
&ipv6_hdr(skb)->saddr is the correct nexthop address to use here.

7. net: pending_confirm is not used anymore

- I failed to understand the CXGB* code, I see dst_confirm()
calls but I'm not sure dst_neigh_output() was called. For now
I just removed the dst->pending_confirm flag and left all
dst_confirm() calls there. Any better idea?

- Now may be old function neigh_output() should be restored
instead of dst_neigh_output?


Julian Anastasov (7):
  sock: add sk_dst_pending_confirm flag
  net: add dst_pending_confirm flag to skbuff
  sctp: add dst_pending_confirm flag
  tcp: replace dst_confirm with sk_dst_confirm
  net: add confirm_neigh method to dst_ops
  net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP
  net: pending_confirm is not used anymore

 drivers/net/vrf.c          |  5 ++++-
 include/linux/skbuff.h     | 12 ++++++++++++
 include/net/arp.h          | 16 ++++++++++++++++
 include/net/dst.h          | 21 +++++++++------------
 include/net/dst_ops.h      |  2 ++
 include/net/ndisc.h        | 17 +++++++++++++++++
 include/net/sctp/sctp.h    |  6 ++----
 include/net/sctp/structs.h |  4 ++++
 include/net/sock.h         | 26 ++++++++++++++++++++++++++
 net/core/dst.c             |  1 -
 net/core/sock.c            |  2 ++
 net/ipv4/ip_output.c       | 11 ++++++++++-
 net/ipv4/ping.c            |  3 ++-
 net/ipv4/raw.c             |  6 +++++-
 net/ipv4/route.c           | 19 +++++++++++++++++++
 net/ipv4/tcp_input.c       | 12 +++---------
 net/ipv4/tcp_metrics.c     |  7 ++-----
 net/ipv4/tcp_output.c      |  2 ++
 net/ipv4/udp.c             |  3 ++-
 net/ipv6/ip6_output.c      |  7 +++++++
 net/ipv6/raw.c             |  6 +++++-
 net/ipv6/route.c           | 43 ++++++++++++++++++++++++++++++-------------
 net/ipv6/udp.c             |  3 ++-
 net/l2tp/l2tp_ip6.c        |  3 ++-
 net/sctp/associola.c       |  3 +--
 net/sctp/output.c          | 10 +++++++++-
 net/sctp/outqueue.c        |  2 +-
 net/sctp/sm_make_chunk.c   |  6 ++----
 net/sctp/sm_sideeffect.c   |  2 +-
 net/sctp/socket.c          |  4 ++--
 net/sctp/transport.c       | 17 ++++++++++++++++-
 net/xfrm/xfrm_policy.c     | 16 ++++++++++++++++
 32 files changed, 233 insertions(+), 64 deletions(-)

-- 
1.9.3


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCHv2 RFC net-next 1/7] sock: add sk_dst_pending_confirm flag
  2017-01-28 14:26 ` Julian Anastasov
@ 2017-01-28 14:26   ` Julian Anastasov
  -1 siblings, 0 replies; 24+ messages in thread
From: Julian Anastasov @ 2017-01-28 14:26 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, YueHaibing

Add new sock flag to allow sockets to confirm neighbour.
When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.
As not all call paths lock the socket use full word for
the flag.

Add sk_dst_confirm as replacement for dst_confirm when
called for received packets.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 include/net/sock.h | 12 ++++++++++++
 net/core/sock.c    |  2 ++
 2 files changed, 14 insertions(+)

diff --git a/include/net/sock.h b/include/net/sock.h
index 7144750..e113786 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -240,6 +240,7 @@ struct sock_common {
   *	@sk_wq: sock wait queue and async head
   *	@sk_rx_dst: receive input route used by early demux
   *	@sk_dst_cache: destination cache
+  *	@sk_dst_pending_confirm: need to confirm neighbour
   *	@sk_policy: flow policy
   *	@sk_receive_queue: incoming packets
   *	@sk_wmem_alloc: transmit queue bytes committed
@@ -393,6 +394,8 @@ struct sock {
 	struct sk_buff_head	sk_write_queue;
 	__s32			sk_peek_off;
 	int			sk_write_pending;
+	__u32			sk_dst_pending_confirm;
+	/* Note: 32bit hole on 64bit arches */
 	long			sk_sndtimeo;
 	struct timer_list	sk_timer;
 	__u32			sk_priority;
@@ -1764,6 +1767,7 @@ static inline void dst_negative_advice(struct sock *sk)
 		if (ndst != dst) {
 			rcu_assign_pointer(sk->sk_dst_cache, ndst);
 			sk_tx_queue_clear(sk);
+			sk->sk_dst_pending_confirm = 0;
 		}
 	}
 }
@@ -1774,6 +1778,7 @@ static inline void dst_negative_advice(struct sock *sk)
 	struct dst_entry *old_dst;
 
 	sk_tx_queue_clear(sk);
+	sk->sk_dst_pending_confirm = 0;
 	/*
 	 * This can be called while sk is owned by the caller only,
 	 * with no state that can be checked in a rcu_dereference_check() cond
@@ -1789,6 +1794,7 @@ static inline void dst_negative_advice(struct sock *sk)
 	struct dst_entry *old_dst;
 
 	sk_tx_queue_clear(sk);
+	sk->sk_dst_pending_confirm = 0;
 	old_dst = xchg((__force struct dst_entry **)&sk->sk_dst_cache, dst);
 	dst_release(old_dst);
 }
@@ -1809,6 +1815,12 @@ static inline void dst_negative_advice(struct sock *sk)
 
 struct dst_entry *sk_dst_check(struct sock *sk, u32 cookie);
 
+static inline void sk_dst_confirm(struct sock *sk)
+{
+	if (!sk->sk_dst_pending_confirm)
+		sk->sk_dst_pending_confirm = 1;
+}
+
 bool sk_mc_loop(struct sock *sk);
 
 static inline bool sk_can_gso(const struct sock *sk)
diff --git a/net/core/sock.c b/net/core/sock.c
index 8b35debf..b743565 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -502,6 +502,7 @@ struct dst_entry *__sk_dst_check(struct sock *sk, u32 cookie)
 
 	if (dst && dst->obsolete && dst->ops->check(dst, cookie) == NULL) {
 		sk_tx_queue_clear(sk);
+		sk->sk_dst_pending_confirm = 0;
 		RCU_INIT_POINTER(sk->sk_dst_cache, NULL);
 		dst_release(dst);
 		return NULL;
@@ -1519,6 +1520,7 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 				af_family_clock_key_strings[newsk->sk_family]);
 
 		newsk->sk_dst_cache	= NULL;
+		newsk->sk_dst_pending_confirm = 0;
 		newsk->sk_wmem_queued	= 0;
 		newsk->sk_forward_alloc = 0;
 		atomic_set(&newsk->sk_drops, 0);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv2 RFC net-next 1/7] sock: add sk_dst_pending_confirm flag
@ 2017-01-28 14:26   ` Julian Anastasov
  0 siblings, 0 replies; 24+ messages in thread
From: Julian Anastasov @ 2017-01-28 14:26 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, YueHaibing

Add new sock flag to allow sockets to confirm neighbour.
When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.
As not all call paths lock the socket use full word for
the flag.

Add sk_dst_confirm as replacement for dst_confirm when
called for received packets.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 include/net/sock.h | 12 ++++++++++++
 net/core/sock.c    |  2 ++
 2 files changed, 14 insertions(+)

diff --git a/include/net/sock.h b/include/net/sock.h
index 7144750..e113786 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -240,6 +240,7 @@ struct sock_common {
   *	@sk_wq: sock wait queue and async head
   *	@sk_rx_dst: receive input route used by early demux
   *	@sk_dst_cache: destination cache
+  *	@sk_dst_pending_confirm: need to confirm neighbour
   *	@sk_policy: flow policy
   *	@sk_receive_queue: incoming packets
   *	@sk_wmem_alloc: transmit queue bytes committed
@@ -393,6 +394,8 @@ struct sock {
 	struct sk_buff_head	sk_write_queue;
 	__s32			sk_peek_off;
 	int			sk_write_pending;
+	__u32			sk_dst_pending_confirm;
+	/* Note: 32bit hole on 64bit arches */
 	long			sk_sndtimeo;
 	struct timer_list	sk_timer;
 	__u32			sk_priority;
@@ -1764,6 +1767,7 @@ static inline void dst_negative_advice(struct sock *sk)
 		if (ndst != dst) {
 			rcu_assign_pointer(sk->sk_dst_cache, ndst);
 			sk_tx_queue_clear(sk);
+			sk->sk_dst_pending_confirm = 0;
 		}
 	}
 }
@@ -1774,6 +1778,7 @@ static inline void dst_negative_advice(struct sock *sk)
 	struct dst_entry *old_dst;
 
 	sk_tx_queue_clear(sk);
+	sk->sk_dst_pending_confirm = 0;
 	/*
 	 * This can be called while sk is owned by the caller only,
 	 * with no state that can be checked in a rcu_dereference_check() cond
@@ -1789,6 +1794,7 @@ static inline void dst_negative_advice(struct sock *sk)
 	struct dst_entry *old_dst;
 
 	sk_tx_queue_clear(sk);
+	sk->sk_dst_pending_confirm = 0;
 	old_dst = xchg((__force struct dst_entry **)&sk->sk_dst_cache, dst);
 	dst_release(old_dst);
 }
@@ -1809,6 +1815,12 @@ static inline void dst_negative_advice(struct sock *sk)
 
 struct dst_entry *sk_dst_check(struct sock *sk, u32 cookie);
 
+static inline void sk_dst_confirm(struct sock *sk)
+{
+	if (!sk->sk_dst_pending_confirm)
+		sk->sk_dst_pending_confirm = 1;
+}
+
 bool sk_mc_loop(struct sock *sk);
 
 static inline bool sk_can_gso(const struct sock *sk)
diff --git a/net/core/sock.c b/net/core/sock.c
index 8b35debf..b743565 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -502,6 +502,7 @@ struct dst_entry *__sk_dst_check(struct sock *sk, u32 cookie)
 
 	if (dst && dst->obsolete && dst->ops->check(dst, cookie) = NULL) {
 		sk_tx_queue_clear(sk);
+		sk->sk_dst_pending_confirm = 0;
 		RCU_INIT_POINTER(sk->sk_dst_cache, NULL);
 		dst_release(dst);
 		return NULL;
@@ -1519,6 +1520,7 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 				af_family_clock_key_strings[newsk->sk_family]);
 
 		newsk->sk_dst_cache	= NULL;
+		newsk->sk_dst_pending_confirm = 0;
 		newsk->sk_wmem_queued	= 0;
 		newsk->sk_forward_alloc = 0;
 		atomic_set(&newsk->sk_drops, 0);
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv2 RFC net-next 2/7] net: add dst_pending_confirm flag to skbuff
  2017-01-28 14:26 ` Julian Anastasov
@ 2017-01-28 14:26   ` Julian Anastasov
  -1 siblings, 0 replies; 24+ messages in thread
From: Julian Anastasov @ 2017-01-28 14:26 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, YueHaibing

Add new skbuff flag to allow protocols to confirm neighbour.
When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.

Add sock_confirm_neigh() helper to confirm the neighbour and
use it for IPv4, IPv6 and VRF before dst_neigh_output.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 drivers/net/vrf.c      |  5 ++++-
 include/linux/skbuff.h | 12 ++++++++++++
 include/net/sock.h     | 14 ++++++++++++++
 net/ipv4/ip_output.c   |  5 ++++-
 net/ipv6/ip6_output.c  |  1 +
 5 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 264fc15..630eafd 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -378,6 +378,7 @@ static int vrf_finish_output6(struct net *net, struct sock *sk,
 	if (unlikely(!neigh))
 		neigh = __neigh_create(&nd_tbl, nexthop, dst->dev, false);
 	if (!IS_ERR(neigh)) {
+		sock_confirm_neigh(skb, neigh);
 		ret = dst_neigh_output(dst, neigh, skb);
 		rcu_read_unlock_bh();
 		return ret;
@@ -574,8 +575,10 @@ static int vrf_finish_output(struct net *net, struct sock *sk, struct sk_buff *s
 	neigh = __ipv4_neigh_lookup_noref(dev, nexthop);
 	if (unlikely(!neigh))
 		neigh = __neigh_create(&arp_tbl, &nexthop, dev, false);
-	if (!IS_ERR(neigh))
+	if (!IS_ERR(neigh)) {
+		sock_confirm_neigh(skb, neigh);
 		ret = dst_neigh_output(dst, neigh, skb);
+	}
 
 	rcu_read_unlock_bh();
 err:
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 6f63b7e..3ac3c3b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -613,6 +613,7 @@ static inline bool skb_mstamp_after(const struct skb_mstamp *t1,
  *	@wifi_acked_valid: wifi_acked was set
  *	@wifi_acked: whether frame was acked on wifi or not
  *	@no_fcs:  Request NIC to treat last 4 bytes as Ethernet FCS
+ *	@dst_pending_confirm: need to confirm neighbour
   *	@napi_id: id of the NAPI struct this skb came from
  *	@secmark: security marking
  *	@mark: Generic packet mark
@@ -743,6 +744,7 @@ struct sk_buff {
 	__u8			csum_level:2;
 	__u8			csum_bad:1;
 
+	__u8			dst_pending_confirm:1;
 #ifdef CONFIG_IPV6_NDISC_NODETYPE
 	__u8			ndisc_nodetype:2;
 #endif
@@ -3694,6 +3696,16 @@ static inline bool skb_rx_queue_recorded(const struct sk_buff *skb)
 	return skb->queue_mapping != 0;
 }
 
+static inline void skb_set_dst_pending_confirm(struct sk_buff *skb, u32 val)
+{
+	skb->dst_pending_confirm = val;
+}
+
+static inline bool skb_get_dst_pending_confirm(const struct sk_buff *skb)
+{
+	return skb->dst_pending_confirm != 0;
+}
+
 static inline struct sec_path *skb_sec_path(struct sk_buff *skb)
 {
 #ifdef CONFIG_XFRM
diff --git a/include/net/sock.h b/include/net/sock.h
index e113786..1bc821e 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1821,6 +1821,20 @@ static inline void sk_dst_confirm(struct sock *sk)
 		sk->sk_dst_pending_confirm = 1;
 }
 
+static inline void sock_confirm_neigh(struct sk_buff *skb, struct neighbour *n)
+{
+	if (skb_get_dst_pending_confirm(skb)) {
+		struct sock *sk = skb->sk;
+		unsigned long now = jiffies;
+
+		/* avoid dirtying neighbour */
+		if (n->confirmed != now)
+			n->confirmed = now;
+		if (sk && sk->sk_dst_pending_confirm)
+			sk->sk_dst_pending_confirm = 0;
+	}
+}
+
 bool sk_mc_loop(struct sock *sk);
 
 static inline bool sk_can_gso(const struct sock *sk)
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index fac275c4..27f1db7 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -222,7 +222,10 @@ static int ip_finish_output2(struct net *net, struct sock *sk, struct sk_buff *s
 	if (unlikely(!neigh))
 		neigh = __neigh_create(&arp_tbl, &nexthop, dev, false);
 	if (!IS_ERR(neigh)) {
-		int res = dst_neigh_output(dst, neigh, skb);
+		int res;
+
+		sock_confirm_neigh(skb, neigh);
+		res = dst_neigh_output(dst, neigh, skb);
 
 		rcu_read_unlock_bh();
 		return res;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 38122d0..7d90cab 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -119,6 +119,7 @@ static int ip6_finish_output2(struct net *net, struct sock *sk, struct sk_buff *
 	if (unlikely(!neigh))
 		neigh = __neigh_create(&nd_tbl, nexthop, dst->dev, false);
 	if (!IS_ERR(neigh)) {
+		sock_confirm_neigh(skb, neigh);
 		ret = dst_neigh_output(dst, neigh, skb);
 		rcu_read_unlock_bh();
 		return ret;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv2 RFC net-next 2/7] net: add dst_pending_confirm flag to skbuff
@ 2017-01-28 14:26   ` Julian Anastasov
  0 siblings, 0 replies; 24+ messages in thread
From: Julian Anastasov @ 2017-01-28 14:26 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, YueHaibing

Add new skbuff flag to allow protocols to confirm neighbour.
When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.

Add sock_confirm_neigh() helper to confirm the neighbour and
use it for IPv4, IPv6 and VRF before dst_neigh_output.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 drivers/net/vrf.c      |  5 ++++-
 include/linux/skbuff.h | 12 ++++++++++++
 include/net/sock.h     | 14 ++++++++++++++
 net/ipv4/ip_output.c   |  5 ++++-
 net/ipv6/ip6_output.c  |  1 +
 5 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 264fc15..630eafd 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -378,6 +378,7 @@ static int vrf_finish_output6(struct net *net, struct sock *sk,
 	if (unlikely(!neigh))
 		neigh = __neigh_create(&nd_tbl, nexthop, dst->dev, false);
 	if (!IS_ERR(neigh)) {
+		sock_confirm_neigh(skb, neigh);
 		ret = dst_neigh_output(dst, neigh, skb);
 		rcu_read_unlock_bh();
 		return ret;
@@ -574,8 +575,10 @@ static int vrf_finish_output(struct net *net, struct sock *sk, struct sk_buff *s
 	neigh = __ipv4_neigh_lookup_noref(dev, nexthop);
 	if (unlikely(!neigh))
 		neigh = __neigh_create(&arp_tbl, &nexthop, dev, false);
-	if (!IS_ERR(neigh))
+	if (!IS_ERR(neigh)) {
+		sock_confirm_neigh(skb, neigh);
 		ret = dst_neigh_output(dst, neigh, skb);
+	}
 
 	rcu_read_unlock_bh();
 err:
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 6f63b7e..3ac3c3b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -613,6 +613,7 @@ static inline bool skb_mstamp_after(const struct skb_mstamp *t1,
  *	@wifi_acked_valid: wifi_acked was set
  *	@wifi_acked: whether frame was acked on wifi or not
  *	@no_fcs:  Request NIC to treat last 4 bytes as Ethernet FCS
+ *	@dst_pending_confirm: need to confirm neighbour
   *	@napi_id: id of the NAPI struct this skb came from
  *	@secmark: security marking
  *	@mark: Generic packet mark
@@ -743,6 +744,7 @@ struct sk_buff {
 	__u8			csum_level:2;
 	__u8			csum_bad:1;
 
+	__u8			dst_pending_confirm:1;
 #ifdef CONFIG_IPV6_NDISC_NODETYPE
 	__u8			ndisc_nodetype:2;
 #endif
@@ -3694,6 +3696,16 @@ static inline bool skb_rx_queue_recorded(const struct sk_buff *skb)
 	return skb->queue_mapping != 0;
 }
 
+static inline void skb_set_dst_pending_confirm(struct sk_buff *skb, u32 val)
+{
+	skb->dst_pending_confirm = val;
+}
+
+static inline bool skb_get_dst_pending_confirm(const struct sk_buff *skb)
+{
+	return skb->dst_pending_confirm != 0;
+}
+
 static inline struct sec_path *skb_sec_path(struct sk_buff *skb)
 {
 #ifdef CONFIG_XFRM
diff --git a/include/net/sock.h b/include/net/sock.h
index e113786..1bc821e 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1821,6 +1821,20 @@ static inline void sk_dst_confirm(struct sock *sk)
 		sk->sk_dst_pending_confirm = 1;
 }
 
+static inline void sock_confirm_neigh(struct sk_buff *skb, struct neighbour *n)
+{
+	if (skb_get_dst_pending_confirm(skb)) {
+		struct sock *sk = skb->sk;
+		unsigned long now = jiffies;
+
+		/* avoid dirtying neighbour */
+		if (n->confirmed != now)
+			n->confirmed = now;
+		if (sk && sk->sk_dst_pending_confirm)
+			sk->sk_dst_pending_confirm = 0;
+	}
+}
+
 bool sk_mc_loop(struct sock *sk);
 
 static inline bool sk_can_gso(const struct sock *sk)
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index fac275c4..27f1db7 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -222,7 +222,10 @@ static int ip_finish_output2(struct net *net, struct sock *sk, struct sk_buff *s
 	if (unlikely(!neigh))
 		neigh = __neigh_create(&arp_tbl, &nexthop, dev, false);
 	if (!IS_ERR(neigh)) {
-		int res = dst_neigh_output(dst, neigh, skb);
+		int res;
+
+		sock_confirm_neigh(skb, neigh);
+		res = dst_neigh_output(dst, neigh, skb);
 
 		rcu_read_unlock_bh();
 		return res;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 38122d0..7d90cab 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -119,6 +119,7 @@ static int ip6_finish_output2(struct net *net, struct sock *sk, struct sk_buff *
 	if (unlikely(!neigh))
 		neigh = __neigh_create(&nd_tbl, nexthop, dst->dev, false);
 	if (!IS_ERR(neigh)) {
+		sock_confirm_neigh(skb, neigh);
 		ret = dst_neigh_output(dst, neigh, skb);
 		rcu_read_unlock_bh();
 		return ret;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv2 RFC net-next 3/7] sctp: add dst_pending_confirm flag
  2017-01-28 14:26 ` Julian Anastasov
@ 2017-01-28 14:26   ` Julian Anastasov
  -1 siblings, 0 replies; 24+ messages in thread
From: Julian Anastasov @ 2017-01-28 14:26 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, YueHaibing

Add new transport flag to allow sockets to confirm neighbour.
When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.
The flag is propagated from transport to every packet.
It is reset when cached dst is reset.

Reported-by: YueHaibing <yuehaibing@huawei.com>
Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.")
Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 include/net/sctp/sctp.h    |  6 ++----
 include/net/sctp/structs.h |  4 ++++
 net/sctp/associola.c       |  3 +--
 net/sctp/output.c          | 10 +++++++++-
 net/sctp/outqueue.c        |  2 +-
 net/sctp/sm_make_chunk.c   |  6 ++----
 net/sctp/sm_sideeffect.c   |  2 +-
 net/sctp/socket.c          |  4 ++--
 net/sctp/transport.c       | 17 ++++++++++++++++-
 9 files changed, 38 insertions(+), 16 deletions(-)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 3cfd365b..480b65a 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -593,10 +593,8 @@ static inline void sctp_v4_map_v6(union sctp_addr *addr)
  */
 static inline struct dst_entry *sctp_transport_dst_check(struct sctp_transport *t)
 {
-	if (t->dst && !dst_check(t->dst, t->dst_cookie)) {
-		dst_release(t->dst);
-		t->dst = NULL;
-	}
+	if (t->dst && !dst_check(t->dst, t->dst_cookie))
+		sctp_transport_dst_release(t);
 
 	return t->dst;
 }
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 231fa9ac..6a68504 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -804,6 +804,8 @@ struct sctp_transport {
 
 	__u32 burst_limited;	/* Holds old cwnd when max.burst is applied */
 
+	__u32 dst_pending_confirm;	/* need to confirm neighbour */
+
 	/* Destination */
 	struct dst_entry *dst;
 	/* Source address. */
@@ -950,6 +952,8 @@ void sctp_transport_route(struct sctp_transport *, union sctp_addr *,
 void sctp_transport_reset(struct sctp_transport *);
 void sctp_transport_update_pmtu(struct sock *, struct sctp_transport *, u32);
 void sctp_transport_immediate_rtx(struct sctp_transport *);
+void sctp_transport_dst_release(struct sctp_transport *t);
+void sctp_transport_dst_confirm(struct sctp_transport *t);
 
 
 /* This is the structure we use to queue packets as they come into
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index e50dc6d..2a6835b 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -832,8 +832,7 @@ void sctp_assoc_control_transport(struct sctp_association *asoc,
 		if (transport->state != SCTP_UNCONFIRMED)
 			transport->state = SCTP_INACTIVE;
 		else {
-			dst_release(transport->dst);
-			transport->dst = NULL;
+			sctp_transport_dst_release(transport);
 			ulp_notify = false;
 		}
 
diff --git a/net/sctp/output.c b/net/sctp/output.c
index 07ab506..814eac0 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -546,6 +546,7 @@ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t gfp)
 	struct sctp_association *asoc = tp->asoc;
 	struct sctp_chunk *chunk, *tmp;
 	int pkt_count, gso = 0;
+	int confirm;
 	struct dst_entry *dst;
 	struct sk_buff *head;
 	struct sctphdr *sh;
@@ -624,7 +625,14 @@ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t gfp)
 			asoc->peer.last_sent_to = tp;
 	}
 	head->ignore_df = packet->ipfragok;
-	tp->af_specific->sctp_xmit(head, tp);
+	confirm = tp->dst_pending_confirm;
+	if (confirm)
+		skb_set_dst_pending_confirm(head, 1);
+	/* neighbour should be confirmed on successful transmission or
+	 * positive error
+	 */
+	if (tp->af_specific->sctp_xmit(head, tp) >= 0 && confirm)
+		tp->dst_pending_confirm = 0;
 
 out:
 	list_for_each_entry_safe(chunk, tmp, &packet->chunk_list, list) {
diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index 65abe22..db352e5 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -1654,7 +1654,7 @@ static void sctp_check_transmitted(struct sctp_outq *q,
 
 		if (forward_progress) {
 			if (transport->dst)
-				dst_confirm(transport->dst);
+				sctp_transport_dst_confirm(transport);
 		}
 	}
 
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index ad3445b..c7d3249 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -3333,8 +3333,7 @@ static void sctp_asconf_param_success(struct sctp_association *asoc,
 		local_bh_enable();
 		list_for_each_entry(transport, &asoc->peer.transport_addr_list,
 				transports) {
-			dst_release(transport->dst);
-			transport->dst = NULL;
+			sctp_transport_dst_release(transport);
 		}
 		break;
 	case SCTP_PARAM_DEL_IP:
@@ -3348,8 +3347,7 @@ static void sctp_asconf_param_success(struct sctp_association *asoc,
 		local_bh_enable();
 		list_for_each_entry(transport, &asoc->peer.transport_addr_list,
 				transports) {
-			dst_release(transport->dst);
-			transport->dst = NULL;
+			sctp_transport_dst_release(transport);
 		}
 		break;
 	default:
diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index a455271..51abcc9 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -755,7 +755,7 @@ static void sctp_cmd_transport_on(sctp_cmd_seq_t *cmds,
 	 * forward progress.
 	 */
 	if (t->dst)
-		dst_confirm(t->dst);
+		sctp_transport_dst_confirm(t);
 
 	/* The receiver of the HEARTBEAT ACK should also perform an
 	 * RTT measurement for that destination transport address
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index d699d2c..9e98c87 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -588,7 +588,7 @@ static int sctp_send_asconf_add_ip(struct sock		*sk,
 			list_for_each_entry(trans,
 			    &asoc->peer.transport_addr_list, transports) {
 				/* Clear the source and route cache */
-				dst_release(trans->dst);
+				sctp_transport_dst_release(trans);
 				trans->cwnd = min(4*asoc->pathmtu, max_t(__u32,
 				    2*asoc->pathmtu, 4380));
 				trans->ssthresh = asoc->peer.i.a_rwnd;
@@ -839,7 +839,7 @@ static int sctp_send_asconf_del_ip(struct sock		*sk,
 		 */
 		list_for_each_entry(transport, &asoc->peer.transport_addr_list,
 					transports) {
-			dst_release(transport->dst);
+			sctp_transport_dst_release(transport);
 			sctp_transport_route(transport, NULL,
 					     sctp_sk(asoc->base.sk));
 		}
diff --git a/net/sctp/transport.c b/net/sctp/transport.c
index baa1ac0..df274ff 100644
--- a/net/sctp/transport.c
+++ b/net/sctp/transport.c
@@ -240,7 +240,7 @@ void sctp_transport_pmtu(struct sctp_transport *transport, struct sock *sk)
 {
 	/* If we don't have a fresh route, look one up */
 	if (!transport->dst || transport->dst->obsolete) {
-		dst_release(transport->dst);
+		sctp_transport_dst_release(transport);
 		transport->af_specific->get_dst(transport, &transport->saddr,
 						&transport->fl, sk);
 	}
@@ -672,3 +672,18 @@ void sctp_transport_immediate_rtx(struct sctp_transport *t)
 			sctp_transport_hold(t);
 	}
 }
+
+/* Drop dst */
+void sctp_transport_dst_release(struct sctp_transport *t)
+{
+	dst_release(t->dst);
+	t->dst = NULL;
+	t->dst_pending_confirm = 0;
+}
+
+/* Schedule neighbour confirm */
+void sctp_transport_dst_confirm(struct sctp_transport *t)
+{
+	t->dst_pending_confirm = 1;
+}
+
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv2 RFC net-next 3/7] sctp: add dst_pending_confirm flag
@ 2017-01-28 14:26   ` Julian Anastasov
  0 siblings, 0 replies; 24+ messages in thread
From: Julian Anastasov @ 2017-01-28 14:26 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, YueHaibing

Add new transport flag to allow sockets to confirm neighbour.
When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.
The flag is propagated from transport to every packet.
It is reset when cached dst is reset.

Reported-by: YueHaibing <yuehaibing@huawei.com>
Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.")
Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 include/net/sctp/sctp.h    |  6 ++----
 include/net/sctp/structs.h |  4 ++++
 net/sctp/associola.c       |  3 +--
 net/sctp/output.c          | 10 +++++++++-
 net/sctp/outqueue.c        |  2 +-
 net/sctp/sm_make_chunk.c   |  6 ++----
 net/sctp/sm_sideeffect.c   |  2 +-
 net/sctp/socket.c          |  4 ++--
 net/sctp/transport.c       | 17 ++++++++++++++++-
 9 files changed, 38 insertions(+), 16 deletions(-)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 3cfd365b..480b65a 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -593,10 +593,8 @@ static inline void sctp_v4_map_v6(union sctp_addr *addr)
  */
 static inline struct dst_entry *sctp_transport_dst_check(struct sctp_transport *t)
 {
-	if (t->dst && !dst_check(t->dst, t->dst_cookie)) {
-		dst_release(t->dst);
-		t->dst = NULL;
-	}
+	if (t->dst && !dst_check(t->dst, t->dst_cookie))
+		sctp_transport_dst_release(t);
 
 	return t->dst;
 }
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 231fa9ac..6a68504 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -804,6 +804,8 @@ struct sctp_transport {
 
 	__u32 burst_limited;	/* Holds old cwnd when max.burst is applied */
 
+	__u32 dst_pending_confirm;	/* need to confirm neighbour */
+
 	/* Destination */
 	struct dst_entry *dst;
 	/* Source address. */
@@ -950,6 +952,8 @@ void sctp_transport_route(struct sctp_transport *, union sctp_addr *,
 void sctp_transport_reset(struct sctp_transport *);
 void sctp_transport_update_pmtu(struct sock *, struct sctp_transport *, u32);
 void sctp_transport_immediate_rtx(struct sctp_transport *);
+void sctp_transport_dst_release(struct sctp_transport *t);
+void sctp_transport_dst_confirm(struct sctp_transport *t);
 
 
 /* This is the structure we use to queue packets as they come into
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index e50dc6d..2a6835b 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -832,8 +832,7 @@ void sctp_assoc_control_transport(struct sctp_association *asoc,
 		if (transport->state != SCTP_UNCONFIRMED)
 			transport->state = SCTP_INACTIVE;
 		else {
-			dst_release(transport->dst);
-			transport->dst = NULL;
+			sctp_transport_dst_release(transport);
 			ulp_notify = false;
 		}
 
diff --git a/net/sctp/output.c b/net/sctp/output.c
index 07ab506..814eac0 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -546,6 +546,7 @@ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t gfp)
 	struct sctp_association *asoc = tp->asoc;
 	struct sctp_chunk *chunk, *tmp;
 	int pkt_count, gso = 0;
+	int confirm;
 	struct dst_entry *dst;
 	struct sk_buff *head;
 	struct sctphdr *sh;
@@ -624,7 +625,14 @@ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t gfp)
 			asoc->peer.last_sent_to = tp;
 	}
 	head->ignore_df = packet->ipfragok;
-	tp->af_specific->sctp_xmit(head, tp);
+	confirm = tp->dst_pending_confirm;
+	if (confirm)
+		skb_set_dst_pending_confirm(head, 1);
+	/* neighbour should be confirmed on successful transmission or
+	 * positive error
+	 */
+	if (tp->af_specific->sctp_xmit(head, tp) >= 0 && confirm)
+		tp->dst_pending_confirm = 0;
 
 out:
 	list_for_each_entry_safe(chunk, tmp, &packet->chunk_list, list) {
diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index 65abe22..db352e5 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -1654,7 +1654,7 @@ static void sctp_check_transmitted(struct sctp_outq *q,
 
 		if (forward_progress) {
 			if (transport->dst)
-				dst_confirm(transport->dst);
+				sctp_transport_dst_confirm(transport);
 		}
 	}
 
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index ad3445b..c7d3249 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -3333,8 +3333,7 @@ static void sctp_asconf_param_success(struct sctp_association *asoc,
 		local_bh_enable();
 		list_for_each_entry(transport, &asoc->peer.transport_addr_list,
 				transports) {
-			dst_release(transport->dst);
-			transport->dst = NULL;
+			sctp_transport_dst_release(transport);
 		}
 		break;
 	case SCTP_PARAM_DEL_IP:
@@ -3348,8 +3347,7 @@ static void sctp_asconf_param_success(struct sctp_association *asoc,
 		local_bh_enable();
 		list_for_each_entry(transport, &asoc->peer.transport_addr_list,
 				transports) {
-			dst_release(transport->dst);
-			transport->dst = NULL;
+			sctp_transport_dst_release(transport);
 		}
 		break;
 	default:
diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index a455271..51abcc9 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -755,7 +755,7 @@ static void sctp_cmd_transport_on(sctp_cmd_seq_t *cmds,
 	 * forward progress.
 	 */
 	if (t->dst)
-		dst_confirm(t->dst);
+		sctp_transport_dst_confirm(t);
 
 	/* The receiver of the HEARTBEAT ACK should also perform an
 	 * RTT measurement for that destination transport address
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index d699d2c..9e98c87 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -588,7 +588,7 @@ static int sctp_send_asconf_add_ip(struct sock		*sk,
 			list_for_each_entry(trans,
 			    &asoc->peer.transport_addr_list, transports) {
 				/* Clear the source and route cache */
-				dst_release(trans->dst);
+				sctp_transport_dst_release(trans);
 				trans->cwnd = min(4*asoc->pathmtu, max_t(__u32,
 				    2*asoc->pathmtu, 4380));
 				trans->ssthresh = asoc->peer.i.a_rwnd;
@@ -839,7 +839,7 @@ static int sctp_send_asconf_del_ip(struct sock		*sk,
 		 */
 		list_for_each_entry(transport, &asoc->peer.transport_addr_list,
 					transports) {
-			dst_release(transport->dst);
+			sctp_transport_dst_release(transport);
 			sctp_transport_route(transport, NULL,
 					     sctp_sk(asoc->base.sk));
 		}
diff --git a/net/sctp/transport.c b/net/sctp/transport.c
index baa1ac0..df274ff 100644
--- a/net/sctp/transport.c
+++ b/net/sctp/transport.c
@@ -240,7 +240,7 @@ void sctp_transport_pmtu(struct sctp_transport *transport, struct sock *sk)
 {
 	/* If we don't have a fresh route, look one up */
 	if (!transport->dst || transport->dst->obsolete) {
-		dst_release(transport->dst);
+		sctp_transport_dst_release(transport);
 		transport->af_specific->get_dst(transport, &transport->saddr,
 						&transport->fl, sk);
 	}
@@ -672,3 +672,18 @@ void sctp_transport_immediate_rtx(struct sctp_transport *t)
 			sctp_transport_hold(t);
 	}
 }
+
+/* Drop dst */
+void sctp_transport_dst_release(struct sctp_transport *t)
+{
+	dst_release(t->dst);
+	t->dst = NULL;
+	t->dst_pending_confirm = 0;
+}
+
+/* Schedule neighbour confirm */
+void sctp_transport_dst_confirm(struct sctp_transport *t)
+{
+	t->dst_pending_confirm = 1;
+}
+
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv2 RFC net-next 4/7] tcp: replace dst_confirm with sk_dst_confirm
  2017-01-28 14:26 ` Julian Anastasov
@ 2017-01-28 14:26   ` Julian Anastasov
  -1 siblings, 0 replies; 24+ messages in thread
From: Julian Anastasov @ 2017-01-28 14:26 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, YueHaibing

When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.
Use the new sk_dst_confirm() helper to propagate the
indication from received packets to sock_confirm_neigh().

Reported-by: YueHaibing <yuehaibing@huawei.com>
Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.")
Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 net/ipv4/tcp_input.c   | 12 +++---------
 net/ipv4/tcp_metrics.c |  7 ++-----
 net/ipv4/tcp_output.c  |  2 ++
 3 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 3de6eba..b3e88bb 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3644,11 +3644,8 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 	if (tp->tlp_high_seq)
 		tcp_process_tlp_ack(sk, ack, flag);
 
-	if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP)) {
-		struct dst_entry *dst = __sk_dst_get(sk);
-		if (dst)
-			dst_confirm(dst);
-	}
+	if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP))
+		sk_dst_confirm(sk);
 
 	if (icsk->icsk_pending == ICSK_TIME_RETRANS)
 		tcp_schedule_loss_probe(sk);
@@ -5995,7 +5992,6 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
 		break;
 
 	case TCP_FIN_WAIT1: {
-		struct dst_entry *dst;
 		int tmo;
 
 		/* If we enter the TCP_FIN_WAIT1 state and we are a
@@ -6022,9 +6018,7 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
 		tcp_set_state(sk, TCP_FIN_WAIT2);
 		sk->sk_shutdown |= SEND_SHUTDOWN;
 
-		dst = __sk_dst_get(sk);
-		if (dst)
-			dst_confirm(dst);
+		sk_dst_confirm(sk);
 
 		if (!sock_flag(sk, SOCK_DEAD)) {
 			/* Wake up lingering close() */
diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
index b9ed0d5..0f46e5f 100644
--- a/net/ipv4/tcp_metrics.c
+++ b/net/ipv4/tcp_metrics.c
@@ -375,12 +375,10 @@ void tcp_update_metrics(struct sock *sk)
 	u32 val;
 	int m;
 
+	sk_dst_confirm(sk);
 	if (sysctl_tcp_nometrics_save || !dst)
 		return;
 
-	if (dst->flags & DST_HOST)
-		dst_confirm(dst);
-
 	rcu_read_lock();
 	if (icsk->icsk_backoff || !tp->srtt_us) {
 		/* This session failed to estimate rtt. Why?
@@ -493,11 +491,10 @@ void tcp_init_metrics(struct sock *sk)
 	struct tcp_metrics_block *tm;
 	u32 val, crtt = 0; /* cached RTT scaled by 8 */
 
+	sk_dst_confirm(sk);
 	if (!dst)
 		goto reset;
 
-	dst_confirm(dst);
-
 	rcu_read_lock();
 	tm = tcp_get_metrics(sk, dst, true);
 	if (!tm) {
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 671c695..c1f8a59 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -973,6 +973,8 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
 	skb_set_hash_from_sk(skb, sk);
 	atomic_add(skb->truesize, &sk->sk_wmem_alloc);
 
+	skb_set_dst_pending_confirm(skb, sk->sk_dst_pending_confirm);
+
 	/* Build TCP header and checksum it. */
 	th = (struct tcphdr *)skb->data;
 	th->source		= inet->inet_sport;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv2 RFC net-next 4/7] tcp: replace dst_confirm with sk_dst_confirm
@ 2017-01-28 14:26   ` Julian Anastasov
  0 siblings, 0 replies; 24+ messages in thread
From: Julian Anastasov @ 2017-01-28 14:26 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, YueHaibing

When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.
Use the new sk_dst_confirm() helper to propagate the
indication from received packets to sock_confirm_neigh().

Reported-by: YueHaibing <yuehaibing@huawei.com>
Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.")
Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 net/ipv4/tcp_input.c   | 12 +++---------
 net/ipv4/tcp_metrics.c |  7 ++-----
 net/ipv4/tcp_output.c  |  2 ++
 3 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 3de6eba..b3e88bb 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3644,11 +3644,8 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 	if (tp->tlp_high_seq)
 		tcp_process_tlp_ack(sk, ack, flag);
 
-	if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP)) {
-		struct dst_entry *dst = __sk_dst_get(sk);
-		if (dst)
-			dst_confirm(dst);
-	}
+	if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP))
+		sk_dst_confirm(sk);
 
 	if (icsk->icsk_pending = ICSK_TIME_RETRANS)
 		tcp_schedule_loss_probe(sk);
@@ -5995,7 +5992,6 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
 		break;
 
 	case TCP_FIN_WAIT1: {
-		struct dst_entry *dst;
 		int tmo;
 
 		/* If we enter the TCP_FIN_WAIT1 state and we are a
@@ -6022,9 +6018,7 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
 		tcp_set_state(sk, TCP_FIN_WAIT2);
 		sk->sk_shutdown |= SEND_SHUTDOWN;
 
-		dst = __sk_dst_get(sk);
-		if (dst)
-			dst_confirm(dst);
+		sk_dst_confirm(sk);
 
 		if (!sock_flag(sk, SOCK_DEAD)) {
 			/* Wake up lingering close() */
diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
index b9ed0d5..0f46e5f 100644
--- a/net/ipv4/tcp_metrics.c
+++ b/net/ipv4/tcp_metrics.c
@@ -375,12 +375,10 @@ void tcp_update_metrics(struct sock *sk)
 	u32 val;
 	int m;
 
+	sk_dst_confirm(sk);
 	if (sysctl_tcp_nometrics_save || !dst)
 		return;
 
-	if (dst->flags & DST_HOST)
-		dst_confirm(dst);
-
 	rcu_read_lock();
 	if (icsk->icsk_backoff || !tp->srtt_us) {
 		/* This session failed to estimate rtt. Why?
@@ -493,11 +491,10 @@ void tcp_init_metrics(struct sock *sk)
 	struct tcp_metrics_block *tm;
 	u32 val, crtt = 0; /* cached RTT scaled by 8 */
 
+	sk_dst_confirm(sk);
 	if (!dst)
 		goto reset;
 
-	dst_confirm(dst);
-
 	rcu_read_lock();
 	tm = tcp_get_metrics(sk, dst, true);
 	if (!tm) {
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 671c695..c1f8a59 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -973,6 +973,8 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
 	skb_set_hash_from_sk(skb, sk);
 	atomic_add(skb->truesize, &sk->sk_wmem_alloc);
 
+	skb_set_dst_pending_confirm(skb, sk->sk_dst_pending_confirm);
+
 	/* Build TCP header and checksum it. */
 	th = (struct tcphdr *)skb->data;
 	th->source		= inet->inet_sport;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv2 RFC net-next 5/7] net: add confirm_neigh method to dst_ops
  2017-01-28 14:26 ` Julian Anastasov
@ 2017-01-28 14:26   ` Julian Anastasov
  -1 siblings, 0 replies; 24+ messages in thread
From: Julian Anastasov @ 2017-01-28 14:26 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, YueHaibing

Add confirm_neigh method to dst_ops and use it from IPv4 and IPv6
to lookup and confirm the neighbour. Its usage via the new helper
dst_confirm_neigh() should be restricted to MSG_PROBE users for
performance reasons.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 include/net/arp.h      | 16 ++++++++++++++++
 include/net/dst.h      |  7 +++++++
 include/net/dst_ops.h  |  2 ++
 include/net/ndisc.h    | 17 +++++++++++++++++
 net/ipv4/route.c       | 19 +++++++++++++++++++
 net/ipv6/route.c       | 16 ++++++++++++++++
 net/xfrm/xfrm_policy.c | 16 ++++++++++++++++
 7 files changed, 93 insertions(+)

diff --git a/include/net/arp.h b/include/net/arp.h
index 5e0f891..65619a2 100644
--- a/include/net/arp.h
+++ b/include/net/arp.h
@@ -35,6 +35,22 @@ static inline struct neighbour *__ipv4_neigh_lookup(struct net_device *dev, u32
 	return n;
 }
 
+static inline void __ipv4_confirm_neigh(struct net_device *dev, u32 key)
+{
+	struct neighbour *n;
+
+	rcu_read_lock_bh();
+	n = __ipv4_neigh_lookup_noref(dev, key);
+	if (n) {
+		unsigned long now = jiffies;
+
+		/* avoid dirtying neighbour */
+		if (n->confirmed != now)
+			n->confirmed = now;
+	}
+	rcu_read_unlock_bh();
+}
+
 void arp_init(void);
 int arp_ioctl(struct net *net, unsigned int cmd, void __user *arg);
 void arp_send(int type, int ptype, __be32 dest_ip,
diff --git a/include/net/dst.h b/include/net/dst.h
index 6835d22..3a3b34b 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -477,6 +477,13 @@ static inline struct neighbour *dst_neigh_lookup_skb(const struct dst_entry *dst
 	return IS_ERR(n) ? NULL : n;
 }
 
+static inline void dst_confirm_neigh(const struct dst_entry *dst,
+				     const void *daddr)
+{
+	if (dst->ops->confirm_neigh)
+		dst->ops->confirm_neigh(dst, daddr);
+}
+
 static inline void dst_link_failure(struct sk_buff *skb)
 {
 	struct dst_entry *dst = skb_dst(skb);
diff --git a/include/net/dst_ops.h b/include/net/dst_ops.h
index 8a2b66d..13f6d59 100644
--- a/include/net/dst_ops.h
+++ b/include/net/dst_ops.h
@@ -33,6 +33,8 @@ struct dst_ops {
 	struct neighbour *	(*neigh_lookup)(const struct dst_entry *dst,
 						struct sk_buff *skb,
 						const void *daddr);
+	void			(*confirm_neigh)(const struct dst_entry *dst,
+						 const void *daddr);
 
 	struct kmem_cache	*kmem_cachep;
 
diff --git a/include/net/ndisc.h b/include/net/ndisc.h
index d562a2f..8a02146 100644
--- a/include/net/ndisc.h
+++ b/include/net/ndisc.h
@@ -391,6 +391,23 @@ static inline struct neighbour *__ipv6_neigh_lookup(struct net_device *dev, cons
 	return n;
 }
 
+static inline void __ipv6_confirm_neigh(struct net_device *dev,
+					const void *pkey)
+{
+	struct neighbour *n;
+
+	rcu_read_lock_bh();
+	n = __ipv6_neigh_lookup_noref(dev, pkey);
+	if (n) {
+		unsigned long now = jiffies;
+
+		/* avoid dirtying neighbour */
+		if (n->confirmed != now)
+			n->confirmed = now;
+	}
+	rcu_read_unlock_bh();
+}
+
 int ndisc_init(void);
 int ndisc_late_init(void);
 
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 4b7c231..cb494a5 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -154,6 +154,7 @@ static u32 *ipv4_cow_metrics(struct dst_entry *dst, unsigned long old)
 static struct neighbour *ipv4_neigh_lookup(const struct dst_entry *dst,
 					   struct sk_buff *skb,
 					   const void *daddr);
+static void ipv4_confirm_neigh(const struct dst_entry *dst, const void *daddr);
 
 static struct dst_ops ipv4_dst_ops = {
 	.family =		AF_INET,
@@ -168,6 +169,7 @@ static struct neighbour *ipv4_neigh_lookup(const struct dst_entry *dst,
 	.redirect =		ip_do_redirect,
 	.local_out =		__ip_local_out,
 	.neigh_lookup =		ipv4_neigh_lookup,
+	.confirm_neigh =	ipv4_confirm_neigh,
 };
 
 #define ECN_OR_COST(class)	TC_PRIO_##class
@@ -461,6 +463,23 @@ static struct neighbour *ipv4_neigh_lookup(const struct dst_entry *dst,
 	return neigh_create(&arp_tbl, pkey, dev);
 }
 
+static void ipv4_confirm_neigh(const struct dst_entry *dst, const void *daddr)
+{
+	struct net_device *dev = dst->dev;
+	const __be32 *pkey = daddr;
+	const struct rtable *rt;
+
+	rt = (const struct rtable *)dst;
+	if (rt->rt_gateway)
+		pkey = (const __be32 *)&rt->rt_gateway;
+	else if (!daddr ||
+		 (rt->rt_flags &
+		  (RTCF_MULTICAST | RTCF_BROADCAST | RTCF_LOCAL)))
+		return;
+
+	__ipv4_confirm_neigh(dev, *(__force u32 *)pkey);
+}
+
 #define IP_IDENTS_SZ 2048u
 
 static atomic_t *ip_idents __read_mostly;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 4b1f0f9..c876940 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -217,6 +217,21 @@ static struct neighbour *ip6_neigh_lookup(const struct dst_entry *dst,
 	return neigh_create(&nd_tbl, daddr, dst->dev);
 }
 
+static void ip6_confirm_neigh(const struct dst_entry *dst, const void *daddr)
+{
+	struct net_device *dev = dst->dev;
+	struct rt6_info *rt = (struct rt6_info *)dst;
+
+	daddr = choose_neigh_daddr(rt, NULL, daddr);
+	if (!daddr)
+		return;
+	if (dev->flags & (IFF_NOARP | IFF_LOOPBACK))
+		return;
+	if (ipv6_addr_is_multicast((const struct in6_addr *)daddr))
+		return;
+	__ipv6_confirm_neigh(dev, daddr);
+}
+
 static struct dst_ops ip6_dst_ops_template = {
 	.family			=	AF_INET6,
 	.gc			=	ip6_dst_gc,
@@ -233,6 +248,7 @@ static struct neighbour *ip6_neigh_lookup(const struct dst_entry *dst,
 	.redirect		=	rt6_do_redirect,
 	.local_out		=	__ip6_local_out,
 	.neigh_lookup		=	ip6_neigh_lookup,
+	.confirm_neigh		=	ip6_confirm_neigh,
 };
 
 static unsigned int ip6_blackhole_mtu(const struct dst_entry *dst)
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 177e208..c010ee0 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -2856,6 +2856,20 @@ static struct neighbour *xfrm_neigh_lookup(const struct dst_entry *dst,
 	return dst->path->ops->neigh_lookup(dst, skb, daddr);
 }
 
+static void xfrm_confirm_neigh(const struct dst_entry *dst, const void *daddr)
+{
+	const struct dst_entry *path = dst->path;
+
+	if (path == dst) {
+		dst->ops->confirm_neigh(dst, daddr);
+	} else {
+		/* daddr can be from different family and we need the
+		 * tunnel address. How to get it?
+		 */
+		path->ops->confirm_neigh(path, NULL);
+	}
+}
+
 int xfrm_policy_register_afinfo(struct xfrm_policy_afinfo *afinfo)
 {
 	int err = 0;
@@ -2882,6 +2896,8 @@ int xfrm_policy_register_afinfo(struct xfrm_policy_afinfo *afinfo)
 			dst_ops->link_failure = xfrm_link_failure;
 		if (likely(dst_ops->neigh_lookup == NULL))
 			dst_ops->neigh_lookup = xfrm_neigh_lookup;
+		if (likely(!dst_ops->confirm_neigh))
+			dst_ops->confirm_neigh = xfrm_confirm_neigh;
 		if (likely(afinfo->garbage_collect == NULL))
 			afinfo->garbage_collect = xfrm_garbage_collect_deferred;
 		rcu_assign_pointer(xfrm_policy_afinfo[afinfo->family], afinfo);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv2 RFC net-next 5/7] net: add confirm_neigh method to dst_ops
@ 2017-01-28 14:26   ` Julian Anastasov
  0 siblings, 0 replies; 24+ messages in thread
From: Julian Anastasov @ 2017-01-28 14:26 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, YueHaibing

Add confirm_neigh method to dst_ops and use it from IPv4 and IPv6
to lookup and confirm the neighbour. Its usage via the new helper
dst_confirm_neigh() should be restricted to MSG_PROBE users for
performance reasons.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 include/net/arp.h      | 16 ++++++++++++++++
 include/net/dst.h      |  7 +++++++
 include/net/dst_ops.h  |  2 ++
 include/net/ndisc.h    | 17 +++++++++++++++++
 net/ipv4/route.c       | 19 +++++++++++++++++++
 net/ipv6/route.c       | 16 ++++++++++++++++
 net/xfrm/xfrm_policy.c | 16 ++++++++++++++++
 7 files changed, 93 insertions(+)

diff --git a/include/net/arp.h b/include/net/arp.h
index 5e0f891..65619a2 100644
--- a/include/net/arp.h
+++ b/include/net/arp.h
@@ -35,6 +35,22 @@ static inline struct neighbour *__ipv4_neigh_lookup(struct net_device *dev, u32
 	return n;
 }
 
+static inline void __ipv4_confirm_neigh(struct net_device *dev, u32 key)
+{
+	struct neighbour *n;
+
+	rcu_read_lock_bh();
+	n = __ipv4_neigh_lookup_noref(dev, key);
+	if (n) {
+		unsigned long now = jiffies;
+
+		/* avoid dirtying neighbour */
+		if (n->confirmed != now)
+			n->confirmed = now;
+	}
+	rcu_read_unlock_bh();
+}
+
 void arp_init(void);
 int arp_ioctl(struct net *net, unsigned int cmd, void __user *arg);
 void arp_send(int type, int ptype, __be32 dest_ip,
diff --git a/include/net/dst.h b/include/net/dst.h
index 6835d22..3a3b34b 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -477,6 +477,13 @@ static inline struct neighbour *dst_neigh_lookup_skb(const struct dst_entry *dst
 	return IS_ERR(n) ? NULL : n;
 }
 
+static inline void dst_confirm_neigh(const struct dst_entry *dst,
+				     const void *daddr)
+{
+	if (dst->ops->confirm_neigh)
+		dst->ops->confirm_neigh(dst, daddr);
+}
+
 static inline void dst_link_failure(struct sk_buff *skb)
 {
 	struct dst_entry *dst = skb_dst(skb);
diff --git a/include/net/dst_ops.h b/include/net/dst_ops.h
index 8a2b66d..13f6d59 100644
--- a/include/net/dst_ops.h
+++ b/include/net/dst_ops.h
@@ -33,6 +33,8 @@ struct dst_ops {
 	struct neighbour *	(*neigh_lookup)(const struct dst_entry *dst,
 						struct sk_buff *skb,
 						const void *daddr);
+	void			(*confirm_neigh)(const struct dst_entry *dst,
+						 const void *daddr);
 
 	struct kmem_cache	*kmem_cachep;
 
diff --git a/include/net/ndisc.h b/include/net/ndisc.h
index d562a2f..8a02146 100644
--- a/include/net/ndisc.h
+++ b/include/net/ndisc.h
@@ -391,6 +391,23 @@ static inline struct neighbour *__ipv6_neigh_lookup(struct net_device *dev, cons
 	return n;
 }
 
+static inline void __ipv6_confirm_neigh(struct net_device *dev,
+					const void *pkey)
+{
+	struct neighbour *n;
+
+	rcu_read_lock_bh();
+	n = __ipv6_neigh_lookup_noref(dev, pkey);
+	if (n) {
+		unsigned long now = jiffies;
+
+		/* avoid dirtying neighbour */
+		if (n->confirmed != now)
+			n->confirmed = now;
+	}
+	rcu_read_unlock_bh();
+}
+
 int ndisc_init(void);
 int ndisc_late_init(void);
 
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 4b7c231..cb494a5 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -154,6 +154,7 @@ static u32 *ipv4_cow_metrics(struct dst_entry *dst, unsigned long old)
 static struct neighbour *ipv4_neigh_lookup(const struct dst_entry *dst,
 					   struct sk_buff *skb,
 					   const void *daddr);
+static void ipv4_confirm_neigh(const struct dst_entry *dst, const void *daddr);
 
 static struct dst_ops ipv4_dst_ops = {
 	.family =		AF_INET,
@@ -168,6 +169,7 @@ static struct neighbour *ipv4_neigh_lookup(const struct dst_entry *dst,
 	.redirect =		ip_do_redirect,
 	.local_out =		__ip_local_out,
 	.neigh_lookup =		ipv4_neigh_lookup,
+	.confirm_neigh =	ipv4_confirm_neigh,
 };
 
 #define ECN_OR_COST(class)	TC_PRIO_##class
@@ -461,6 +463,23 @@ static struct neighbour *ipv4_neigh_lookup(const struct dst_entry *dst,
 	return neigh_create(&arp_tbl, pkey, dev);
 }
 
+static void ipv4_confirm_neigh(const struct dst_entry *dst, const void *daddr)
+{
+	struct net_device *dev = dst->dev;
+	const __be32 *pkey = daddr;
+	const struct rtable *rt;
+
+	rt = (const struct rtable *)dst;
+	if (rt->rt_gateway)
+		pkey = (const __be32 *)&rt->rt_gateway;
+	else if (!daddr ||
+		 (rt->rt_flags &
+		  (RTCF_MULTICAST | RTCF_BROADCAST | RTCF_LOCAL)))
+		return;
+
+	__ipv4_confirm_neigh(dev, *(__force u32 *)pkey);
+}
+
 #define IP_IDENTS_SZ 2048u
 
 static atomic_t *ip_idents __read_mostly;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 4b1f0f9..c876940 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -217,6 +217,21 @@ static struct neighbour *ip6_neigh_lookup(const struct dst_entry *dst,
 	return neigh_create(&nd_tbl, daddr, dst->dev);
 }
 
+static void ip6_confirm_neigh(const struct dst_entry *dst, const void *daddr)
+{
+	struct net_device *dev = dst->dev;
+	struct rt6_info *rt = (struct rt6_info *)dst;
+
+	daddr = choose_neigh_daddr(rt, NULL, daddr);
+	if (!daddr)
+		return;
+	if (dev->flags & (IFF_NOARP | IFF_LOOPBACK))
+		return;
+	if (ipv6_addr_is_multicast((const struct in6_addr *)daddr))
+		return;
+	__ipv6_confirm_neigh(dev, daddr);
+}
+
 static struct dst_ops ip6_dst_ops_template = {
 	.family			=	AF_INET6,
 	.gc			=	ip6_dst_gc,
@@ -233,6 +248,7 @@ static struct neighbour *ip6_neigh_lookup(const struct dst_entry *dst,
 	.redirect		=	rt6_do_redirect,
 	.local_out		=	__ip6_local_out,
 	.neigh_lookup		=	ip6_neigh_lookup,
+	.confirm_neigh		=	ip6_confirm_neigh,
 };
 
 static unsigned int ip6_blackhole_mtu(const struct dst_entry *dst)
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 177e208..c010ee0 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -2856,6 +2856,20 @@ static struct neighbour *xfrm_neigh_lookup(const struct dst_entry *dst,
 	return dst->path->ops->neigh_lookup(dst, skb, daddr);
 }
 
+static void xfrm_confirm_neigh(const struct dst_entry *dst, const void *daddr)
+{
+	const struct dst_entry *path = dst->path;
+
+	if (path = dst) {
+		dst->ops->confirm_neigh(dst, daddr);
+	} else {
+		/* daddr can be from different family and we need the
+		 * tunnel address. How to get it?
+		 */
+		path->ops->confirm_neigh(path, NULL);
+	}
+}
+
 int xfrm_policy_register_afinfo(struct xfrm_policy_afinfo *afinfo)
 {
 	int err = 0;
@@ -2882,6 +2896,8 @@ int xfrm_policy_register_afinfo(struct xfrm_policy_afinfo *afinfo)
 			dst_ops->link_failure = xfrm_link_failure;
 		if (likely(dst_ops->neigh_lookup = NULL))
 			dst_ops->neigh_lookup = xfrm_neigh_lookup;
+		if (likely(!dst_ops->confirm_neigh))
+			dst_ops->confirm_neigh = xfrm_confirm_neigh;
 		if (likely(afinfo->garbage_collect = NULL))
 			afinfo->garbage_collect = xfrm_garbage_collect_deferred;
 		rcu_assign_pointer(xfrm_policy_afinfo[afinfo->family], afinfo);
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv2 RFC net-next 6/7] net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP
  2017-01-28 14:26 ` Julian Anastasov
@ 2017-01-28 14:26   ` Julian Anastasov
  -1 siblings, 0 replies; 24+ messages in thread
From: Julian Anastasov @ 2017-01-28 14:26 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, YueHaibing

When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.

The datagram protocols can use MSG_CONFIRM to confirm the
neighbour. When used with MSG_PROBE we do not reach the
code where neighbour is confirmed, so we have to do the
same slow lookup by using the dst_confirm_neigh() helper.
When MSG_PROBE is not used, ip_append_data/ip6_append_data
will set the skb flag dst_pending_confirm.

Reported-by: YueHaibing <yuehaibing@huawei.com>
Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.")
Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 net/ipv4/ip_output.c  |  6 ++++++
 net/ipv4/ping.c       |  3 ++-
 net/ipv4/raw.c        |  6 +++++-
 net/ipv4/udp.c        |  3 ++-
 net/ipv6/ip6_output.c |  6 ++++++
 net/ipv6/raw.c        |  6 +++++-
 net/ipv6/route.c      | 27 ++++++++++++++-------------
 net/ipv6/udp.c        |  3 ++-
 net/l2tp/l2tp_ip6.c   |  3 ++-
 9 files changed, 44 insertions(+), 19 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 27f1db7..ff0fcaa 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -889,6 +889,9 @@ static inline int ip_ufo_append_data(struct sock *sk,
 
 		skb->csum = 0;
 
+		if (flags & MSG_CONFIRM)
+			skb_set_dst_pending_confirm(skb, 1);
+
 		__skb_queue_tail(queue, skb);
 	} else if (skb_is_gso(skb)) {
 		goto append;
@@ -1089,6 +1092,9 @@ static int __ip_append_data(struct sock *sk,
 			exthdrlen = 0;
 			csummode = CHECKSUM_NONE;
 
+			if ((flags & MSG_CONFIRM) && !skb_prev)
+				skb_set_dst_pending_confirm(skb, 1);
+
 			/*
 			 * Put the packet on the pending queue.
 			 */
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 592db6a..6ee792d 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -848,7 +848,8 @@ static int ping_v4_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	return err;
 
 do_confirm:
-	dst_confirm(&rt->dst);
+	if (msg->msg_flags & MSG_PROBE)
+		dst_confirm_neigh(&rt->dst, &fl4.daddr);
 	if (!(msg->msg_flags & MSG_PROBE) || len)
 		goto back_from_confirm;
 	err = 0;
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 4e49e5c..8119e1f 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -383,6 +383,9 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4,
 
 	sock_tx_timestamp(sk, sockc->tsflags, &skb_shinfo(skb)->tx_flags);
 
+	if (flags & MSG_CONFIRM)
+		skb_set_dst_pending_confirm(skb, 1);
+
 	skb->transport_header = skb->network_header;
 	err = -EFAULT;
 	if (memcpy_from_msg(iph, msg, length))
@@ -666,7 +669,8 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	return len;
 
 do_confirm:
-	dst_confirm(&rt->dst);
+	if (msg->msg_flags & MSG_PROBE)
+		dst_confirm_neigh(&rt->dst, &fl4.daddr);
 	if (!(msg->msg_flags & MSG_PROBE) || len)
 		goto back_from_confirm;
 	err = 0;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index d6dddcf..4bdb358 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1088,7 +1088,8 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	return err;
 
 do_confirm:
-	dst_confirm(&rt->dst);
+	if (msg->msg_flags & MSG_PROBE)
+		dst_confirm_neigh(&rt->dst, &fl4->daddr);
 	if (!(msg->msg_flags&MSG_PROBE) || len)
 		goto back_from_confirm;
 	err = 0;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 7d90cab..5d944c1 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1145,6 +1145,9 @@ static inline int ip6_ufo_append_data(struct sock *sk,
 		skb->protocol = htons(ETH_P_IPV6);
 		skb->csum = 0;
 
+		if (flags & MSG_CONFIRM)
+			skb_set_dst_pending_confirm(skb, 1);
+
 		__skb_queue_tail(queue, skb);
 	} else if (skb_is_gso(skb)) {
 		goto append;
@@ -1517,6 +1520,9 @@ static int __ip6_append_data(struct sock *sk,
 			exthdrlen = 0;
 			dst_exthdrlen = 0;
 
+			if ((flags & MSG_CONFIRM) && !skb_prev)
+				skb_set_dst_pending_confirm(skb, 1);
+
 			/*
 			 * Put the packet on the pending queue
 			 */
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index ea89073..f174e76 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -654,6 +654,9 @@ static int rawv6_send_hdrinc(struct sock *sk, struct msghdr *msg, int length,
 
 	skb->ip_summed = CHECKSUM_NONE;
 
+	if (flags & MSG_CONFIRM)
+		skb_set_dst_pending_confirm(skb, 1);
+
 	skb->transport_header = skb->network_header;
 	err = memcpy_from_msg(iph, msg, length);
 	if (err)
@@ -934,7 +937,8 @@ static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	txopt_put(opt_to_free);
 	return err < 0 ? err : len;
 do_confirm:
-	dst_confirm(dst);
+	if (msg->msg_flags & MSG_PROBE)
+		dst_confirm_neigh(dst, &fl6.daddr);
 	if (!(msg->msg_flags & MSG_PROBE) || len)
 		goto back_from_confirm;
 	err = 0;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index c876940..15e45a6 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1375,6 +1375,7 @@ static bool rt6_cache_allowed_for_pmtu(const struct rt6_info *rt)
 static void __ip6_rt_update_pmtu(struct dst_entry *dst, const struct sock *sk,
 				 const struct ipv6hdr *iph, u32 mtu)
 {
+	const struct in6_addr *daddr, *saddr;
 	struct rt6_info *rt6 = (struct rt6_info *)dst;
 
 	if (rt6->rt6i_flags & RTF_LOCAL)
@@ -1383,26 +1384,26 @@ static void __ip6_rt_update_pmtu(struct dst_entry *dst, const struct sock *sk,
 	if (dst_metric_locked(dst, RTAX_MTU))
 		return;
 
-	dst_confirm(dst);
+	if (iph) {
+		daddr = &iph->daddr;
+		saddr = &iph->saddr;
+	} else if (sk) {
+		daddr = &sk->sk_v6_daddr;
+		saddr = &inet6_sk(sk)->saddr;
+	} else {
+		daddr = NULL;
+		saddr = NULL;
+	}
+	dst_confirm_neigh(dst, daddr);
 	mtu = max_t(u32, mtu, IPV6_MIN_MTU);
 	if (mtu >= dst_mtu(dst))
 		return;
 
 	if (!rt6_cache_allowed_for_pmtu(rt6)) {
 		rt6_do_update_pmtu(rt6, mtu);
-	} else {
-		const struct in6_addr *daddr, *saddr;
+	} else if (daddr) {
 		struct rt6_info *nrt6;
 
-		if (iph) {
-			daddr = &iph->daddr;
-			saddr = &iph->saddr;
-		} else if (sk) {
-			daddr = &sk->sk_v6_daddr;
-			saddr = &inet6_sk(sk)->saddr;
-		} else {
-			return;
-		}
 		nrt6 = ip6_rt_cache_alloc(rt6, daddr, saddr);
 		if (nrt6) {
 			rt6_do_update_pmtu(nrt6, mtu);
@@ -2274,7 +2275,7 @@ static void rt6_do_redirect(struct dst_entry *dst, struct sock *sk, struct sk_bu
 	 * Look, redirects are sent only in response to data packets,
 	 * so that this nexthop apparently is reachable. --ANK
 	 */
-	dst_confirm(&rt->dst);
+	dst_confirm_neigh(&rt->dst, &ipv6_hdr(skb)->saddr);
 
 	neigh = __neigh_lookup(&nd_tbl, &msg->target, skb->dev, 1);
 	if (!neigh)
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 05d6932..9402f7a 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1295,7 +1295,8 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	return err;
 
 do_confirm:
-	dst_confirm(dst);
+	if (msg->msg_flags & MSG_PROBE)
+		dst_confirm_neigh(dst, &fl6.daddr);
 	if (!(msg->msg_flags&MSG_PROBE) || len)
 		goto back_from_confirm;
 	err = 0;
diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
index 4b06eb4..734798a 100644
--- a/net/l2tp/l2tp_ip6.c
+++ b/net/l2tp/l2tp_ip6.c
@@ -658,7 +658,8 @@ static int l2tp_ip6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	return err < 0 ? err : len;
 
 do_confirm:
-	dst_confirm(dst);
+	if (msg->msg_flags & MSG_PROBE)
+		dst_confirm_neigh(dst, &fl6.daddr);
 	if (!(msg->msg_flags & MSG_PROBE) || len)
 		goto back_from_confirm;
 	err = 0;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv2 RFC net-next 6/7] net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP
@ 2017-01-28 14:26   ` Julian Anastasov
  0 siblings, 0 replies; 24+ messages in thread
From: Julian Anastasov @ 2017-01-28 14:26 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, YueHaibing

When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.

The datagram protocols can use MSG_CONFIRM to confirm the
neighbour. When used with MSG_PROBE we do not reach the
code where neighbour is confirmed, so we have to do the
same slow lookup by using the dst_confirm_neigh() helper.
When MSG_PROBE is not used, ip_append_data/ip6_append_data
will set the skb flag dst_pending_confirm.

Reported-by: YueHaibing <yuehaibing@huawei.com>
Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.")
Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 net/ipv4/ip_output.c  |  6 ++++++
 net/ipv4/ping.c       |  3 ++-
 net/ipv4/raw.c        |  6 +++++-
 net/ipv4/udp.c        |  3 ++-
 net/ipv6/ip6_output.c |  6 ++++++
 net/ipv6/raw.c        |  6 +++++-
 net/ipv6/route.c      | 27 ++++++++++++++-------------
 net/ipv6/udp.c        |  3 ++-
 net/l2tp/l2tp_ip6.c   |  3 ++-
 9 files changed, 44 insertions(+), 19 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 27f1db7..ff0fcaa 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -889,6 +889,9 @@ static inline int ip_ufo_append_data(struct sock *sk,
 
 		skb->csum = 0;
 
+		if (flags & MSG_CONFIRM)
+			skb_set_dst_pending_confirm(skb, 1);
+
 		__skb_queue_tail(queue, skb);
 	} else if (skb_is_gso(skb)) {
 		goto append;
@@ -1089,6 +1092,9 @@ static int __ip_append_data(struct sock *sk,
 			exthdrlen = 0;
 			csummode = CHECKSUM_NONE;
 
+			if ((flags & MSG_CONFIRM) && !skb_prev)
+				skb_set_dst_pending_confirm(skb, 1);
+
 			/*
 			 * Put the packet on the pending queue.
 			 */
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 592db6a..6ee792d 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -848,7 +848,8 @@ static int ping_v4_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	return err;
 
 do_confirm:
-	dst_confirm(&rt->dst);
+	if (msg->msg_flags & MSG_PROBE)
+		dst_confirm_neigh(&rt->dst, &fl4.daddr);
 	if (!(msg->msg_flags & MSG_PROBE) || len)
 		goto back_from_confirm;
 	err = 0;
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 4e49e5c..8119e1f 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -383,6 +383,9 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4,
 
 	sock_tx_timestamp(sk, sockc->tsflags, &skb_shinfo(skb)->tx_flags);
 
+	if (flags & MSG_CONFIRM)
+		skb_set_dst_pending_confirm(skb, 1);
+
 	skb->transport_header = skb->network_header;
 	err = -EFAULT;
 	if (memcpy_from_msg(iph, msg, length))
@@ -666,7 +669,8 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	return len;
 
 do_confirm:
-	dst_confirm(&rt->dst);
+	if (msg->msg_flags & MSG_PROBE)
+		dst_confirm_neigh(&rt->dst, &fl4.daddr);
 	if (!(msg->msg_flags & MSG_PROBE) || len)
 		goto back_from_confirm;
 	err = 0;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index d6dddcf..4bdb358 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1088,7 +1088,8 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	return err;
 
 do_confirm:
-	dst_confirm(&rt->dst);
+	if (msg->msg_flags & MSG_PROBE)
+		dst_confirm_neigh(&rt->dst, &fl4->daddr);
 	if (!(msg->msg_flags&MSG_PROBE) || len)
 		goto back_from_confirm;
 	err = 0;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 7d90cab..5d944c1 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1145,6 +1145,9 @@ static inline int ip6_ufo_append_data(struct sock *sk,
 		skb->protocol = htons(ETH_P_IPV6);
 		skb->csum = 0;
 
+		if (flags & MSG_CONFIRM)
+			skb_set_dst_pending_confirm(skb, 1);
+
 		__skb_queue_tail(queue, skb);
 	} else if (skb_is_gso(skb)) {
 		goto append;
@@ -1517,6 +1520,9 @@ static int __ip6_append_data(struct sock *sk,
 			exthdrlen = 0;
 			dst_exthdrlen = 0;
 
+			if ((flags & MSG_CONFIRM) && !skb_prev)
+				skb_set_dst_pending_confirm(skb, 1);
+
 			/*
 			 * Put the packet on the pending queue
 			 */
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index ea89073..f174e76 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -654,6 +654,9 @@ static int rawv6_send_hdrinc(struct sock *sk, struct msghdr *msg, int length,
 
 	skb->ip_summed = CHECKSUM_NONE;
 
+	if (flags & MSG_CONFIRM)
+		skb_set_dst_pending_confirm(skb, 1);
+
 	skb->transport_header = skb->network_header;
 	err = memcpy_from_msg(iph, msg, length);
 	if (err)
@@ -934,7 +937,8 @@ static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	txopt_put(opt_to_free);
 	return err < 0 ? err : len;
 do_confirm:
-	dst_confirm(dst);
+	if (msg->msg_flags & MSG_PROBE)
+		dst_confirm_neigh(dst, &fl6.daddr);
 	if (!(msg->msg_flags & MSG_PROBE) || len)
 		goto back_from_confirm;
 	err = 0;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index c876940..15e45a6 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1375,6 +1375,7 @@ static bool rt6_cache_allowed_for_pmtu(const struct rt6_info *rt)
 static void __ip6_rt_update_pmtu(struct dst_entry *dst, const struct sock *sk,
 				 const struct ipv6hdr *iph, u32 mtu)
 {
+	const struct in6_addr *daddr, *saddr;
 	struct rt6_info *rt6 = (struct rt6_info *)dst;
 
 	if (rt6->rt6i_flags & RTF_LOCAL)
@@ -1383,26 +1384,26 @@ static void __ip6_rt_update_pmtu(struct dst_entry *dst, const struct sock *sk,
 	if (dst_metric_locked(dst, RTAX_MTU))
 		return;
 
-	dst_confirm(dst);
+	if (iph) {
+		daddr = &iph->daddr;
+		saddr = &iph->saddr;
+	} else if (sk) {
+		daddr = &sk->sk_v6_daddr;
+		saddr = &inet6_sk(sk)->saddr;
+	} else {
+		daddr = NULL;
+		saddr = NULL;
+	}
+	dst_confirm_neigh(dst, daddr);
 	mtu = max_t(u32, mtu, IPV6_MIN_MTU);
 	if (mtu >= dst_mtu(dst))
 		return;
 
 	if (!rt6_cache_allowed_for_pmtu(rt6)) {
 		rt6_do_update_pmtu(rt6, mtu);
-	} else {
-		const struct in6_addr *daddr, *saddr;
+	} else if (daddr) {
 		struct rt6_info *nrt6;
 
-		if (iph) {
-			daddr = &iph->daddr;
-			saddr = &iph->saddr;
-		} else if (sk) {
-			daddr = &sk->sk_v6_daddr;
-			saddr = &inet6_sk(sk)->saddr;
-		} else {
-			return;
-		}
 		nrt6 = ip6_rt_cache_alloc(rt6, daddr, saddr);
 		if (nrt6) {
 			rt6_do_update_pmtu(nrt6, mtu);
@@ -2274,7 +2275,7 @@ static void rt6_do_redirect(struct dst_entry *dst, struct sock *sk, struct sk_bu
 	 * Look, redirects are sent only in response to data packets,
 	 * so that this nexthop apparently is reachable. --ANK
 	 */
-	dst_confirm(&rt->dst);
+	dst_confirm_neigh(&rt->dst, &ipv6_hdr(skb)->saddr);
 
 	neigh = __neigh_lookup(&nd_tbl, &msg->target, skb->dev, 1);
 	if (!neigh)
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 05d6932..9402f7a 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1295,7 +1295,8 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	return err;
 
 do_confirm:
-	dst_confirm(dst);
+	if (msg->msg_flags & MSG_PROBE)
+		dst_confirm_neigh(dst, &fl6.daddr);
 	if (!(msg->msg_flags&MSG_PROBE) || len)
 		goto back_from_confirm;
 	err = 0;
diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
index 4b06eb4..734798a 100644
--- a/net/l2tp/l2tp_ip6.c
+++ b/net/l2tp/l2tp_ip6.c
@@ -658,7 +658,8 @@ static int l2tp_ip6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	return err < 0 ? err : len;
 
 do_confirm:
-	dst_confirm(dst);
+	if (msg->msg_flags & MSG_PROBE)
+		dst_confirm_neigh(dst, &fl6.daddr);
 	if (!(msg->msg_flags & MSG_PROBE) || len)
 		goto back_from_confirm;
 	err = 0;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv2 RFC net-next 7/7] net: pending_confirm is not used anymore
  2017-01-28 14:26 ` Julian Anastasov
@ 2017-01-28 14:26   ` Julian Anastasov
  -1 siblings, 0 replies; 24+ messages in thread
From: Julian Anastasov @ 2017-01-28 14:26 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, YueHaibing

When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.
As last step, we can remove the pending_confirm flag.

Reported-by: YueHaibing <yuehaibing@huawei.com>
Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.")
Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 include/net/dst.h | 14 ++------------
 net/core/dst.c    |  1 -
 2 files changed, 2 insertions(+), 13 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 3a3b34b..84a1043 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -59,8 +59,6 @@ struct dst_entry {
 #define DST_XFRM_QUEUE		0x0100
 #define DST_METADATA		0x0200
 
-	unsigned short		pending_confirm;
-
 	short			error;
 
 	/* A non-zero value of dst->obsolete forces by-hand validation
@@ -78,6 +76,8 @@ struct dst_entry {
 #define DST_OBSOLETE_KILL	-2
 	unsigned short		header_len;	/* more space at head required */
 	unsigned short		trailer_len;	/* space to reserve at tail */
+	unsigned short		__pad3;
+
 #ifdef CONFIG_IP_ROUTE_CLASSID
 	__u32			tclassid;
 #else
@@ -440,7 +440,6 @@ static inline void dst_rcu_free(struct rcu_head *head)
 
 static inline void dst_confirm(struct dst_entry *dst)
 {
-	dst->pending_confirm = 1;
 }
 
 static inline int dst_neigh_output(struct dst_entry *dst, struct neighbour *n,
@@ -448,15 +447,6 @@ static inline int dst_neigh_output(struct dst_entry *dst, struct neighbour *n,
 {
 	const struct hh_cache *hh;
 
-	if (dst->pending_confirm) {
-		unsigned long now = jiffies;
-
-		dst->pending_confirm = 0;
-		/* avoid dirtying neighbour */
-		if (n->confirmed != now)
-			n->confirmed = now;
-	}
-
 	hh = &n->hh;
 	if ((n->nud_state & NUD_CONNECTED) && hh->hh_len)
 		return neigh_hh_output(hh, skb);
diff --git a/net/core/dst.c b/net/core/dst.c
index b5cbbe0..960e503 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -190,7 +190,6 @@ void dst_init(struct dst_entry *dst, struct dst_ops *ops,
 	dst->__use = 0;
 	dst->lastuse = jiffies;
 	dst->flags = flags;
-	dst->pending_confirm = 0;
 	dst->next = NULL;
 	if (!(flags & DST_NOCOUNT))
 		dst_entries_add(ops, 1);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv2 RFC net-next 7/7] net: pending_confirm is not used anymore
@ 2017-01-28 14:26   ` Julian Anastasov
  0 siblings, 0 replies; 24+ messages in thread
From: Julian Anastasov @ 2017-01-28 14:26 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, YueHaibing

When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.
As last step, we can remove the pending_confirm flag.

Reported-by: YueHaibing <yuehaibing@huawei.com>
Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.")
Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 include/net/dst.h | 14 ++------------
 net/core/dst.c    |  1 -
 2 files changed, 2 insertions(+), 13 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 3a3b34b..84a1043 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -59,8 +59,6 @@ struct dst_entry {
 #define DST_XFRM_QUEUE		0x0100
 #define DST_METADATA		0x0200
 
-	unsigned short		pending_confirm;
-
 	short			error;
 
 	/* A non-zero value of dst->obsolete forces by-hand validation
@@ -78,6 +76,8 @@ struct dst_entry {
 #define DST_OBSOLETE_KILL	-2
 	unsigned short		header_len;	/* more space at head required */
 	unsigned short		trailer_len;	/* space to reserve at tail */
+	unsigned short		__pad3;
+
 #ifdef CONFIG_IP_ROUTE_CLASSID
 	__u32			tclassid;
 #else
@@ -440,7 +440,6 @@ static inline void dst_rcu_free(struct rcu_head *head)
 
 static inline void dst_confirm(struct dst_entry *dst)
 {
-	dst->pending_confirm = 1;
 }
 
 static inline int dst_neigh_output(struct dst_entry *dst, struct neighbour *n,
@@ -448,15 +447,6 @@ static inline int dst_neigh_output(struct dst_entry *dst, struct neighbour *n,
 {
 	const struct hh_cache *hh;
 
-	if (dst->pending_confirm) {
-		unsigned long now = jiffies;
-
-		dst->pending_confirm = 0;
-		/* avoid dirtying neighbour */
-		if (n->confirmed != now)
-			n->confirmed = now;
-	}
-
 	hh = &n->hh;
 	if ((n->nud_state & NUD_CONNECTED) && hh->hh_len)
 		return neigh_hh_output(hh, skb);
diff --git a/net/core/dst.c b/net/core/dst.c
index b5cbbe0..960e503 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -190,7 +190,6 @@ void dst_init(struct dst_entry *dst, struct dst_ops *ops,
 	dst->__use = 0;
 	dst->lastuse = jiffies;
 	dst->flags = flags;
-	dst->pending_confirm = 0;
 	dst->next = NULL;
 	if (!(flags & DST_NOCOUNT))
 		dst_entries_add(ops, 1);
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCHv2 RFC net-next 4/7] tcp: replace dst_confirm with sk_dst_confirm
  2017-01-28 14:26   ` Julian Anastasov
@ 2017-01-29 19:27     ` Eric Dumazet
  -1 siblings, 0 replies; 24+ messages in thread
From: Eric Dumazet @ 2017-01-29 19:27 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: netdev, linux-sctp, YueHaibing

On Sat, 2017-01-28 at 16:26 +0200, Julian Anastasov wrote:
> When same struct dst_entry can be used for many different
> neighbours we can not use it for pending confirmations.
> Use the new sk_dst_confirm() helper to propagate the
> indication from received packets to sock_confirm_neigh().
> 
> Reported-by: YueHaibing <yuehaibing@huawei.com>
> Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.")
> Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.")
> Signed-off-by: Julian Anastasov <ja@ssi.bg>
> ---

For the whole series,

Acked-by: Eric Dumazet <edumazet@google.com>

Thanks !

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCHv2 RFC net-next 4/7] tcp: replace dst_confirm with sk_dst_confirm
@ 2017-01-29 19:27     ` Eric Dumazet
  0 siblings, 0 replies; 24+ messages in thread
From: Eric Dumazet @ 2017-01-29 19:27 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: netdev, linux-sctp, YueHaibing

On Sat, 2017-01-28 at 16:26 +0200, Julian Anastasov wrote:
> When same struct dst_entry can be used for many different
> neighbours we can not use it for pending confirmations.
> Use the new sk_dst_confirm() helper to propagate the
> indication from received packets to sock_confirm_neigh().
> 
> Reported-by: YueHaibing <yuehaibing@huawei.com>
> Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.")
> Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.")
> Signed-off-by: Julian Anastasov <ja@ssi.bg>
> ---

For the whole series,

Acked-by: Eric Dumazet <edumazet@google.com>

Thanks !


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCHv2 RFC net-next 3/7] sctp: add dst_pending_confirm flag
  2017-01-28 14:26   ` Julian Anastasov
@ 2017-01-30 14:45     ` Neil Horman
  -1 siblings, 0 replies; 24+ messages in thread
From: Neil Horman @ 2017-01-30 14:45 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: netdev, linux-sctp, YueHaibing

On Sat, Jan 28, 2017 at 04:26:14PM +0200, Julian Anastasov wrote:
> Add new transport flag to allow sockets to confirm neighbour.
> When same struct dst_entry can be used for many different
> neighbours we can not use it for pending confirmations.
> The flag is propagated from transport to every packet.
> It is reset when cached dst is reset.
> 
> Reported-by: YueHaibing <yuehaibing@huawei.com>
> Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.")
> Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.")
> Signed-off-by: Julian Anastasov <ja@ssi.bg>
> ---
>  include/net/sctp/sctp.h    |  6 ++----
>  include/net/sctp/structs.h |  4 ++++
>  net/sctp/associola.c       |  3 +--
>  net/sctp/output.c          | 10 +++++++++-
>  net/sctp/outqueue.c        |  2 +-
>  net/sctp/sm_make_chunk.c   |  6 ++----
>  net/sctp/sm_sideeffect.c   |  2 +-
>  net/sctp/socket.c          |  4 ++--
>  net/sctp/transport.c       | 17 ++++++++++++++++-
>  9 files changed, 38 insertions(+), 16 deletions(-)
> 
> diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
> index 3cfd365b..480b65a 100644
> --- a/include/net/sctp/sctp.h
> +++ b/include/net/sctp/sctp.h
> @@ -593,10 +593,8 @@ static inline void sctp_v4_map_v6(union sctp_addr *addr)
>   */
>  static inline struct dst_entry *sctp_transport_dst_check(struct sctp_transport *t)
>  {
> -	if (t->dst && !dst_check(t->dst, t->dst_cookie)) {
> -		dst_release(t->dst);
> -		t->dst = NULL;
> -	}
> +	if (t->dst && !dst_check(t->dst, t->dst_cookie))
> +		sctp_transport_dst_release(t);
>  
>  	return t->dst;
>  }
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index 231fa9ac..6a68504 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -804,6 +804,8 @@ struct sctp_transport {
>  
>  	__u32 burst_limited;	/* Holds old cwnd when max.burst is applied */
>  
> +	__u32 dst_pending_confirm;	/* need to confirm neighbour */
> +
>  	/* Destination */
>  	struct dst_entry *dst;
>  	/* Source address. */
> @@ -950,6 +952,8 @@ void sctp_transport_route(struct sctp_transport *, union sctp_addr *,
>  void sctp_transport_reset(struct sctp_transport *);
>  void sctp_transport_update_pmtu(struct sock *, struct sctp_transport *, u32);
>  void sctp_transport_immediate_rtx(struct sctp_transport *);
> +void sctp_transport_dst_release(struct sctp_transport *t);
> +void sctp_transport_dst_confirm(struct sctp_transport *t);
>  
>  
>  /* This is the structure we use to queue packets as they come into
> diff --git a/net/sctp/associola.c b/net/sctp/associola.c
> index e50dc6d..2a6835b 100644
> --- a/net/sctp/associola.c
> +++ b/net/sctp/associola.c
> @@ -832,8 +832,7 @@ void sctp_assoc_control_transport(struct sctp_association *asoc,
>  		if (transport->state != SCTP_UNCONFIRMED)
>  			transport->state = SCTP_INACTIVE;
>  		else {
> -			dst_release(transport->dst);
> -			transport->dst = NULL;
> +			sctp_transport_dst_release(transport);
>  			ulp_notify = false;
>  		}
>  
> diff --git a/net/sctp/output.c b/net/sctp/output.c
> index 07ab506..814eac0 100644
> --- a/net/sctp/output.c
> +++ b/net/sctp/output.c
> @@ -546,6 +546,7 @@ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t gfp)
>  	struct sctp_association *asoc = tp->asoc;
>  	struct sctp_chunk *chunk, *tmp;
>  	int pkt_count, gso = 0;
> +	int confirm;
>  	struct dst_entry *dst;
>  	struct sk_buff *head;
>  	struct sctphdr *sh;
> @@ -624,7 +625,14 @@ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t gfp)
>  			asoc->peer.last_sent_to = tp;
>  	}
>  	head->ignore_df = packet->ipfragok;
> -	tp->af_specific->sctp_xmit(head, tp);
> +	confirm = tp->dst_pending_confirm;
> +	if (confirm)
> +		skb_set_dst_pending_confirm(head, 1);
> +	/* neighbour should be confirmed on successful transmission or
> +	 * positive error
> +	 */
> +	if (tp->af_specific->sctp_xmit(head, tp) >= 0 && confirm)
> +		tp->dst_pending_confirm = 0;
>  
>  out:
>  	list_for_each_entry_safe(chunk, tmp, &packet->chunk_list, list) {
> diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
> index 65abe22..db352e5 100644
> --- a/net/sctp/outqueue.c
> +++ b/net/sctp/outqueue.c
> @@ -1654,7 +1654,7 @@ static void sctp_check_transmitted(struct sctp_outq *q,
>  
>  		if (forward_progress) {
>  			if (transport->dst)
> -				dst_confirm(transport->dst);
> +				sctp_transport_dst_confirm(transport);
>  		}
>  	}
>  
> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> index ad3445b..c7d3249 100644
> --- a/net/sctp/sm_make_chunk.c
> +++ b/net/sctp/sm_make_chunk.c
> @@ -3333,8 +3333,7 @@ static void sctp_asconf_param_success(struct sctp_association *asoc,
>  		local_bh_enable();
>  		list_for_each_entry(transport, &asoc->peer.transport_addr_list,
>  				transports) {
> -			dst_release(transport->dst);
> -			transport->dst = NULL;
> +			sctp_transport_dst_release(transport);
>  		}
>  		break;
>  	case SCTP_PARAM_DEL_IP:
> @@ -3348,8 +3347,7 @@ static void sctp_asconf_param_success(struct sctp_association *asoc,
>  		local_bh_enable();
>  		list_for_each_entry(transport, &asoc->peer.transport_addr_list,
>  				transports) {
> -			dst_release(transport->dst);
> -			transport->dst = NULL;
> +			sctp_transport_dst_release(transport);
>  		}
>  		break;
>  	default:
> diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
> index a455271..51abcc9 100644
> --- a/net/sctp/sm_sideeffect.c
> +++ b/net/sctp/sm_sideeffect.c
> @@ -755,7 +755,7 @@ static void sctp_cmd_transport_on(sctp_cmd_seq_t *cmds,
>  	 * forward progress.
>  	 */
>  	if (t->dst)
> -		dst_confirm(t->dst);
> +		sctp_transport_dst_confirm(t);
>  
>  	/* The receiver of the HEARTBEAT ACK should also perform an
>  	 * RTT measurement for that destination transport address
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index d699d2c..9e98c87 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -588,7 +588,7 @@ static int sctp_send_asconf_add_ip(struct sock		*sk,
>  			list_for_each_entry(trans,
>  			    &asoc->peer.transport_addr_list, transports) {
>  				/* Clear the source and route cache */
> -				dst_release(trans->dst);
> +				sctp_transport_dst_release(trans);
>  				trans->cwnd = min(4*asoc->pathmtu, max_t(__u32,
>  				    2*asoc->pathmtu, 4380));
>  				trans->ssthresh = asoc->peer.i.a_rwnd;
> @@ -839,7 +839,7 @@ static int sctp_send_asconf_del_ip(struct sock		*sk,
>  		 */
>  		list_for_each_entry(transport, &asoc->peer.transport_addr_list,
>  					transports) {
> -			dst_release(transport->dst);
> +			sctp_transport_dst_release(transport);
>  			sctp_transport_route(transport, NULL,
>  					     sctp_sk(asoc->base.sk));
>  		}
> diff --git a/net/sctp/transport.c b/net/sctp/transport.c
> index baa1ac0..df274ff 100644
> --- a/net/sctp/transport.c
> +++ b/net/sctp/transport.c
> @@ -240,7 +240,7 @@ void sctp_transport_pmtu(struct sctp_transport *transport, struct sock *sk)
>  {
>  	/* If we don't have a fresh route, look one up */
>  	if (!transport->dst || transport->dst->obsolete) {
> -		dst_release(transport->dst);
> +		sctp_transport_dst_release(transport);
>  		transport->af_specific->get_dst(transport, &transport->saddr,
>  						&transport->fl, sk);
>  	}
> @@ -672,3 +672,18 @@ void sctp_transport_immediate_rtx(struct sctp_transport *t)
>  			sctp_transport_hold(t);
>  	}
>  }
> +
> +/* Drop dst */
> +void sctp_transport_dst_release(struct sctp_transport *t)
> +{
> +	dst_release(t->dst);
> +	t->dst = NULL;
> +	t->dst_pending_confirm = 0;
> +}
> +
> +/* Schedule neighbour confirm */
> +void sctp_transport_dst_confirm(struct sctp_transport *t)
> +{
> +	t->dst_pending_confirm = 1;
> +}
> +
> -- 
> 1.9.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
For the SCTP bits:

Acked-by: Neil Horman <nhorman@tuxdriver.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCHv2 RFC net-next 3/7] sctp: add dst_pending_confirm flag
@ 2017-01-30 14:45     ` Neil Horman
  0 siblings, 0 replies; 24+ messages in thread
From: Neil Horman @ 2017-01-30 14:45 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: netdev, linux-sctp, YueHaibing

On Sat, Jan 28, 2017 at 04:26:14PM +0200, Julian Anastasov wrote:
> Add new transport flag to allow sockets to confirm neighbour.
> When same struct dst_entry can be used for many different
> neighbours we can not use it for pending confirmations.
> The flag is propagated from transport to every packet.
> It is reset when cached dst is reset.
> 
> Reported-by: YueHaibing <yuehaibing@huawei.com>
> Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.")
> Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.")
> Signed-off-by: Julian Anastasov <ja@ssi.bg>
> ---
>  include/net/sctp/sctp.h    |  6 ++----
>  include/net/sctp/structs.h |  4 ++++
>  net/sctp/associola.c       |  3 +--
>  net/sctp/output.c          | 10 +++++++++-
>  net/sctp/outqueue.c        |  2 +-
>  net/sctp/sm_make_chunk.c   |  6 ++----
>  net/sctp/sm_sideeffect.c   |  2 +-
>  net/sctp/socket.c          |  4 ++--
>  net/sctp/transport.c       | 17 ++++++++++++++++-
>  9 files changed, 38 insertions(+), 16 deletions(-)
> 
> diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
> index 3cfd365b..480b65a 100644
> --- a/include/net/sctp/sctp.h
> +++ b/include/net/sctp/sctp.h
> @@ -593,10 +593,8 @@ static inline void sctp_v4_map_v6(union sctp_addr *addr)
>   */
>  static inline struct dst_entry *sctp_transport_dst_check(struct sctp_transport *t)
>  {
> -	if (t->dst && !dst_check(t->dst, t->dst_cookie)) {
> -		dst_release(t->dst);
> -		t->dst = NULL;
> -	}
> +	if (t->dst && !dst_check(t->dst, t->dst_cookie))
> +		sctp_transport_dst_release(t);
>  
>  	return t->dst;
>  }
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index 231fa9ac..6a68504 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -804,6 +804,8 @@ struct sctp_transport {
>  
>  	__u32 burst_limited;	/* Holds old cwnd when max.burst is applied */
>  
> +	__u32 dst_pending_confirm;	/* need to confirm neighbour */
> +
>  	/* Destination */
>  	struct dst_entry *dst;
>  	/* Source address. */
> @@ -950,6 +952,8 @@ void sctp_transport_route(struct sctp_transport *, union sctp_addr *,
>  void sctp_transport_reset(struct sctp_transport *);
>  void sctp_transport_update_pmtu(struct sock *, struct sctp_transport *, u32);
>  void sctp_transport_immediate_rtx(struct sctp_transport *);
> +void sctp_transport_dst_release(struct sctp_transport *t);
> +void sctp_transport_dst_confirm(struct sctp_transport *t);
>  
>  
>  /* This is the structure we use to queue packets as they come into
> diff --git a/net/sctp/associola.c b/net/sctp/associola.c
> index e50dc6d..2a6835b 100644
> --- a/net/sctp/associola.c
> +++ b/net/sctp/associola.c
> @@ -832,8 +832,7 @@ void sctp_assoc_control_transport(struct sctp_association *asoc,
>  		if (transport->state != SCTP_UNCONFIRMED)
>  			transport->state = SCTP_INACTIVE;
>  		else {
> -			dst_release(transport->dst);
> -			transport->dst = NULL;
> +			sctp_transport_dst_release(transport);
>  			ulp_notify = false;
>  		}
>  
> diff --git a/net/sctp/output.c b/net/sctp/output.c
> index 07ab506..814eac0 100644
> --- a/net/sctp/output.c
> +++ b/net/sctp/output.c
> @@ -546,6 +546,7 @@ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t gfp)
>  	struct sctp_association *asoc = tp->asoc;
>  	struct sctp_chunk *chunk, *tmp;
>  	int pkt_count, gso = 0;
> +	int confirm;
>  	struct dst_entry *dst;
>  	struct sk_buff *head;
>  	struct sctphdr *sh;
> @@ -624,7 +625,14 @@ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t gfp)
>  			asoc->peer.last_sent_to = tp;
>  	}
>  	head->ignore_df = packet->ipfragok;
> -	tp->af_specific->sctp_xmit(head, tp);
> +	confirm = tp->dst_pending_confirm;
> +	if (confirm)
> +		skb_set_dst_pending_confirm(head, 1);
> +	/* neighbour should be confirmed on successful transmission or
> +	 * positive error
> +	 */
> +	if (tp->af_specific->sctp_xmit(head, tp) >= 0 && confirm)
> +		tp->dst_pending_confirm = 0;
>  
>  out:
>  	list_for_each_entry_safe(chunk, tmp, &packet->chunk_list, list) {
> diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
> index 65abe22..db352e5 100644
> --- a/net/sctp/outqueue.c
> +++ b/net/sctp/outqueue.c
> @@ -1654,7 +1654,7 @@ static void sctp_check_transmitted(struct sctp_outq *q,
>  
>  		if (forward_progress) {
>  			if (transport->dst)
> -				dst_confirm(transport->dst);
> +				sctp_transport_dst_confirm(transport);
>  		}
>  	}
>  
> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> index ad3445b..c7d3249 100644
> --- a/net/sctp/sm_make_chunk.c
> +++ b/net/sctp/sm_make_chunk.c
> @@ -3333,8 +3333,7 @@ static void sctp_asconf_param_success(struct sctp_association *asoc,
>  		local_bh_enable();
>  		list_for_each_entry(transport, &asoc->peer.transport_addr_list,
>  				transports) {
> -			dst_release(transport->dst);
> -			transport->dst = NULL;
> +			sctp_transport_dst_release(transport);
>  		}
>  		break;
>  	case SCTP_PARAM_DEL_IP:
> @@ -3348,8 +3347,7 @@ static void sctp_asconf_param_success(struct sctp_association *asoc,
>  		local_bh_enable();
>  		list_for_each_entry(transport, &asoc->peer.transport_addr_list,
>  				transports) {
> -			dst_release(transport->dst);
> -			transport->dst = NULL;
> +			sctp_transport_dst_release(transport);
>  		}
>  		break;
>  	default:
> diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
> index a455271..51abcc9 100644
> --- a/net/sctp/sm_sideeffect.c
> +++ b/net/sctp/sm_sideeffect.c
> @@ -755,7 +755,7 @@ static void sctp_cmd_transport_on(sctp_cmd_seq_t *cmds,
>  	 * forward progress.
>  	 */
>  	if (t->dst)
> -		dst_confirm(t->dst);
> +		sctp_transport_dst_confirm(t);
>  
>  	/* The receiver of the HEARTBEAT ACK should also perform an
>  	 * RTT measurement for that destination transport address
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index d699d2c..9e98c87 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -588,7 +588,7 @@ static int sctp_send_asconf_add_ip(struct sock		*sk,
>  			list_for_each_entry(trans,
>  			    &asoc->peer.transport_addr_list, transports) {
>  				/* Clear the source and route cache */
> -				dst_release(trans->dst);
> +				sctp_transport_dst_release(trans);
>  				trans->cwnd = min(4*asoc->pathmtu, max_t(__u32,
>  				    2*asoc->pathmtu, 4380));
>  				trans->ssthresh = asoc->peer.i.a_rwnd;
> @@ -839,7 +839,7 @@ static int sctp_send_asconf_del_ip(struct sock		*sk,
>  		 */
>  		list_for_each_entry(transport, &asoc->peer.transport_addr_list,
>  					transports) {
> -			dst_release(transport->dst);
> +			sctp_transport_dst_release(transport);
>  			sctp_transport_route(transport, NULL,
>  					     sctp_sk(asoc->base.sk));
>  		}
> diff --git a/net/sctp/transport.c b/net/sctp/transport.c
> index baa1ac0..df274ff 100644
> --- a/net/sctp/transport.c
> +++ b/net/sctp/transport.c
> @@ -240,7 +240,7 @@ void sctp_transport_pmtu(struct sctp_transport *transport, struct sock *sk)
>  {
>  	/* If we don't have a fresh route, look one up */
>  	if (!transport->dst || transport->dst->obsolete) {
> -		dst_release(transport->dst);
> +		sctp_transport_dst_release(transport);
>  		transport->af_specific->get_dst(transport, &transport->saddr,
>  						&transport->fl, sk);
>  	}
> @@ -672,3 +672,18 @@ void sctp_transport_immediate_rtx(struct sctp_transport *t)
>  			sctp_transport_hold(t);
>  	}
>  }
> +
> +/* Drop dst */
> +void sctp_transport_dst_release(struct sctp_transport *t)
> +{
> +	dst_release(t->dst);
> +	t->dst = NULL;
> +	t->dst_pending_confirm = 0;
> +}
> +
> +/* Schedule neighbour confirm */
> +void sctp_transport_dst_confirm(struct sctp_transport *t)
> +{
> +	t->dst_pending_confirm = 1;
> +}
> +
> -- 
> 1.9.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
For the SCTP bits:

Acked-by: Neil Horman <nhorman@tuxdriver.com>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCHv2 RFC net-next 0/7] net: dst_confirm replacement
  2017-01-28 14:26 ` Julian Anastasov
@ 2017-01-30 15:13   ` David Miller
  -1 siblings, 0 replies; 24+ messages in thread
From: David Miller @ 2017-01-30 15:13 UTC (permalink / raw)
  To: ja; +Cc: netdev, linux-sctp, yuehaibing

From: Julian Anastasov <ja@ssi.bg>
Date: Sat, 28 Jan 2017 16:26:11 +0200

> 	This patchset addresses the problem of neighbour
> confirmation where received replies from one nexthop
> can cause confirmation of different nexthop when using
> the same dst. Thanks to YueHaibing <yuehaibing@huawei.com>
> for tracking the dst->pending_confirm problem.
> 
> 	Sockets can obtain cached output route. Such
> routes can be to known nexthop (rt_gateway=IP) or to be
> used simultaneously for different nexthop IPs by different
> subnet prefixes (nh->nh_scope = RT_SCOPE_HOST, rt_gateway=0).
> 
> 	At first look, there are more problems:
> 
> - dst_confirm() sets flag on dst and not on dst->path,
> as result, indication is lost when XFRM is used
> 
> - DNAT can change the nexthop, so the really used nexthop is
> not confirmed
> 
> 	So, the following solution is to avoid using
> dst->pending_confirm.

For the most part this series looks good to me, nice work.

> - I failed to understand the CXGB* code, I see dst_confirm()
> calls but I'm not sure dst_neigh_output() was called. For now
> I just removed the dst->pending_confirm flag and left all
> dst_confirm() calls there. Any better idea?

It is trying to manage the dst and neigh state for TCP connections
managed by it's offload engine.  So you will not therefore see any
actual packet output for the sessions it is performing confirmation
for.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCHv2 RFC net-next 0/7] net: dst_confirm replacement
@ 2017-01-30 15:13   ` David Miller
  0 siblings, 0 replies; 24+ messages in thread
From: David Miller @ 2017-01-30 15:13 UTC (permalink / raw)
  To: ja; +Cc: netdev, linux-sctp, yuehaibing

From: Julian Anastasov <ja@ssi.bg>
Date: Sat, 28 Jan 2017 16:26:11 +0200

> 	This patchset addresses the problem of neighbour
> confirmation where received replies from one nexthop
> can cause confirmation of different nexthop when using
> the same dst. Thanks to YueHaibing <yuehaibing@huawei.com>
> for tracking the dst->pending_confirm problem.
> 
> 	Sockets can obtain cached output route. Such
> routes can be to known nexthop (rt_gateway=IP) or to be
> used simultaneously for different nexthop IPs by different
> subnet prefixes (nh->nh_scope = RT_SCOPE_HOST, rt_gateway=0).
> 
> 	At first look, there are more problems:
> 
> - dst_confirm() sets flag on dst and not on dst->path,
> as result, indication is lost when XFRM is used
> 
> - DNAT can change the nexthop, so the really used nexthop is
> not confirmed
> 
> 	So, the following solution is to avoid using
> dst->pending_confirm.

For the most part this series looks good to me, nice work.

> - I failed to understand the CXGB* code, I see dst_confirm()
> calls but I'm not sure dst_neigh_output() was called. For now
> I just removed the dst->pending_confirm flag and left all
> dst_confirm() calls there. Any better idea?

It is trying to manage the dst and neigh state for TCP connections
managed by it's offload engine.  So you will not therefore see any
actual packet output for the sessions it is performing confirmation
for.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCHv2 RFC net-next 0/7] net: dst_confirm replacement
  2017-01-30 15:13   ` David Miller
@ 2017-01-31 21:53     ` Julian Anastasov
  -1 siblings, 0 replies; 24+ messages in thread
From: Julian Anastasov @ 2017-01-31 21:53 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-sctp, yuehaibing


	Hello,

On Mon, 30 Jan 2017, David Miller wrote:

> From: Julian Anastasov <ja@ssi.bg>
> > 
> > 	So, the following solution is to avoid using
> > dst->pending_confirm.
> 
> For the most part this series looks good to me, nice work.

	OK, I'm posting v3 after removing the RFC tag from v2,
I just added the Acked-by tags.

Regards

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCHv2 RFC net-next 0/7] net: dst_confirm replacement
@ 2017-01-31 21:53     ` Julian Anastasov
  0 siblings, 0 replies; 24+ messages in thread
From: Julian Anastasov @ 2017-01-31 21:53 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-sctp, yuehaibing


	Hello,

On Mon, 30 Jan 2017, David Miller wrote:

> From: Julian Anastasov <ja@ssi.bg>
> > 
> > 	So, the following solution is to avoid using
> > dst->pending_confirm.
> 
> For the most part this series looks good to me, nice work.

	OK, I'm posting v3 after removing the RFC tag from v2,
I just added the Acked-by tags.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2017-01-31 21:53 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-28 14:26 [PATCHv2 RFC net-next 0/7] net: dst_confirm replacement Julian Anastasov
2017-01-28 14:26 ` Julian Anastasov
2017-01-28 14:26 ` [PATCHv2 RFC net-next 1/7] sock: add sk_dst_pending_confirm flag Julian Anastasov
2017-01-28 14:26   ` Julian Anastasov
2017-01-28 14:26 ` [PATCHv2 RFC net-next 2/7] net: add dst_pending_confirm flag to skbuff Julian Anastasov
2017-01-28 14:26   ` Julian Anastasov
2017-01-28 14:26 ` [PATCHv2 RFC net-next 3/7] sctp: add dst_pending_confirm flag Julian Anastasov
2017-01-28 14:26   ` Julian Anastasov
2017-01-30 14:45   ` Neil Horman
2017-01-30 14:45     ` Neil Horman
2017-01-28 14:26 ` [PATCHv2 RFC net-next 4/7] tcp: replace dst_confirm with sk_dst_confirm Julian Anastasov
2017-01-28 14:26   ` Julian Anastasov
2017-01-29 19:27   ` Eric Dumazet
2017-01-29 19:27     ` Eric Dumazet
2017-01-28 14:26 ` [PATCHv2 RFC net-next 5/7] net: add confirm_neigh method to dst_ops Julian Anastasov
2017-01-28 14:26   ` Julian Anastasov
2017-01-28 14:26 ` [PATCHv2 RFC net-next 6/7] net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP Julian Anastasov
2017-01-28 14:26   ` Julian Anastasov
2017-01-28 14:26 ` [PATCHv2 RFC net-next 7/7] net: pending_confirm is not used anymore Julian Anastasov
2017-01-28 14:26   ` Julian Anastasov
2017-01-30 15:13 ` [PATCHv2 RFC net-next 0/7] net: dst_confirm replacement David Miller
2017-01-30 15:13   ` David Miller
2017-01-31 21:53   ` Julian Anastasov
2017-01-31 21:53     ` Julian Anastasov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.