All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC net-next 0/7] net: dst_confirm replacement
@ 2016-12-18 20:56 ` Julian Anastasov
  0 siblings, 0 replies; 34+ messages in thread
From: Julian Anastasov @ 2016-12-18 20:56 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, YueHaibing

	This patchset addresses the problem of neighbour
confirmation where received replies from one nexthop
can cause confirmation of different nexthop when using
the same dst. Thanks to YueHaibing <yuehaibing@huawei.com>
for tracking the dst->pending_confirm problem.

	Sockets can obtain cached output route. Such
routes can be to known nexthop (rt_gateway=IP) or to be
used simultaneously for different nexthop IPs by different
subnet prefixes (nh->nh_scope = RT_SCOPE_HOST, rt_gateway=0).

	At first look, there are more problems:

- dst_confirm() sets flag on dst and not on dst->path,
as result, indication is lost when XFRM is used

- DNAT can change the nexthop, so the really used nexthop is
not confirmed

	So, the following solution is to avoid using
dst->pending_confirm.

	The current dst_confirm() usage is as follows:

Protocols confirming dst on received packets:
- TCP (1 dst per socket)
- SCTP (1 dst per transport)
- CXGB*

Protocols supporting sendmsg with MSG_CONFIRM [ | MSG_PROBE ] to
confirm neighbour:
- UDP IPv4/IPv6
- ICMPv4 PING
- RAW IPv4/IPv6
- L2TP/IPv6

MSG_CONFIRM for other purposes (fix not needed):
- CAN

Sending without locking the socket:
- UDP (when no cork)
- RAW (when hdrincl=1)

Redirects from old to new GW:
- rt6_do_redirect


	The patchset includes the following changes:

1. sock: add sk_dst_pending_confirm flag

- used only by TCP with patch 4 to remember the received
indication in sk->sk_dst_pending_confirm

2. net: add dst_pending_confirm flag to skbuff

- skb->dst_pending_confirm will be used by all protocols
in following patches

3. sctp: add dst_pending_confirm flag

- SCTP uses per-transport dsts and can not use
sk->sk_dst_pending_confirm like TCP

4. tcp: replace dst_confirm with sk_dst_confirm

5. net: add confirm_neigh method to dst_ops

- IPv4 and IPv6 provision for slow neigh lookups for MSG_PROBE users.
I decided to use neigh lookup only for this case because on
MSG_PROBE the skb may pass MTU checks but it does not reach
the neigh confirmation code. This patch will be used from patch 6.

- xfrm_confirm_neigh: support is incomplete here, only routes with
known nexthops (gateway) are supported because the tunnel address
is slow to obtain. Or there is solution to this problem?

6. net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP

- dst_confirm conversion for UDP, RAW, ICMP and L2TP/IPv6

- these protocols use MSG_CONFIRM propagated by ip*_append_data
to skb->dst_pending_confirm. sk->sk_dst_pending_confirm is not
used because some sending paths do not lock the socket. For
MSG_PROBE we use the slow lookup (dst_confirm_neigh).

- there are also 2 cases that need the slow lookup:
__ip6_rt_update_pmtu and rt6_do_redirect. I hope
&ipv6_hdr(skb)->saddr is the correct nexthop address to use here.

7. net: pending_confirm is not used anymore

- I failed to understand the CXGB* code, I see dst_confirm()
calls but I'm not sure dst_neigh_output() was called. For now
I just removed the dst->pending_confirm flag and left all
dst_confirm() calls there. Any better idea?

- Now may be old function neigh_output() should be restored
instead of dst_neigh_output?


Julian Anastasov (7):
  sock: add sk_dst_pending_confirm flag
  net: add dst_pending_confirm flag to skbuff
  sctp: add dst_pending_confirm flag
  tcp: replace dst_confirm with sk_dst_confirm
  net: add confirm_neigh method to dst_ops
  net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP
  net: pending_confirm is not used anymore

 drivers/net/vrf.c          |  5 ++++-
 include/linux/skbuff.h     |  4 +++-
 include/net/arp.h          | 16 ++++++++++++++++
 include/net/dst.h          | 21 +++++++++------------
 include/net/dst_ops.h      |  2 ++
 include/net/ndisc.h        | 17 +++++++++++++++++
 include/net/sctp/sctp.h    |  6 ++----
 include/net/sctp/structs.h |  4 ++++
 include/net/sock.h         | 28 +++++++++++++++++++++++++++-
 net/core/dst.c             |  1 -
 net/core/sock.c            |  2 ++
 net/ipv4/ip_output.c       | 11 ++++++++++-
 net/ipv4/ping.c            |  3 ++-
 net/ipv4/raw.c             |  6 +++++-
 net/ipv4/route.c           | 19 +++++++++++++++++++
 net/ipv4/tcp_input.c       | 12 +++---------
 net/ipv4/tcp_metrics.c     |  7 ++-----
 net/ipv4/tcp_output.c      |  2 ++
 net/ipv4/udp.c             |  3 ++-
 net/ipv6/ip6_output.c      |  7 +++++++
 net/ipv6/raw.c             |  6 +++++-
 net/ipv6/route.c           | 43 ++++++++++++++++++++++++++++++-------------
 net/ipv6/udp.c             |  3 ++-
 net/l2tp/l2tp_ip6.c        |  3 ++-
 net/sctp/associola.c       |  3 +--
 net/sctp/output.c          | 10 +++++++++-
 net/sctp/outqueue.c        |  2 +-
 net/sctp/sm_make_chunk.c   |  6 ++----
 net/sctp/sm_sideeffect.c   |  2 +-
 net/sctp/socket.c          |  4 ++--
 net/sctp/transport.c       | 18 +++++++++++++++++-
 net/xfrm/xfrm_policy.c     | 16 ++++++++++++++++
 32 files changed, 226 insertions(+), 66 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH RFC net-next 0/7] net: dst_confirm replacement
@ 2016-12-18 20:56 ` Julian Anastasov
  0 siblings, 0 replies; 34+ messages in thread
From: Julian Anastasov @ 2016-12-18 20:56 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, YueHaibing

	This patchset addresses the problem of neighbour
confirmation where received replies from one nexthop
can cause confirmation of different nexthop when using
the same dst. Thanks to YueHaibing <yuehaibing@huawei.com>
for tracking the dst->pending_confirm problem.

	Sockets can obtain cached output route. Such
routes can be to known nexthop (rt_gateway=IP) or to be
used simultaneously for different nexthop IPs by different
subnet prefixes (nh->nh_scope = RT_SCOPE_HOST, rt_gateway=0).

	At first look, there are more problems:

- dst_confirm() sets flag on dst and not on dst->path,
as result, indication is lost when XFRM is used

- DNAT can change the nexthop, so the really used nexthop is
not confirmed

	So, the following solution is to avoid using
dst->pending_confirm.

	The current dst_confirm() usage is as follows:

Protocols confirming dst on received packets:
- TCP (1 dst per socket)
- SCTP (1 dst per transport)
- CXGB*

Protocols supporting sendmsg with MSG_CONFIRM [ | MSG_PROBE ] to
confirm neighbour:
- UDP IPv4/IPv6
- ICMPv4 PING
- RAW IPv4/IPv6
- L2TP/IPv6

MSG_CONFIRM for other purposes (fix not needed):
- CAN

Sending without locking the socket:
- UDP (when no cork)
- RAW (when hdrincl=1)

Redirects from old to new GW:
- rt6_do_redirect


	The patchset includes the following changes:

1. sock: add sk_dst_pending_confirm flag

- used only by TCP with patch 4 to remember the received
indication in sk->sk_dst_pending_confirm

2. net: add dst_pending_confirm flag to skbuff

- skb->dst_pending_confirm will be used by all protocols
in following patches

3. sctp: add dst_pending_confirm flag

- SCTP uses per-transport dsts and can not use
sk->sk_dst_pending_confirm like TCP

4. tcp: replace dst_confirm with sk_dst_confirm

5. net: add confirm_neigh method to dst_ops

- IPv4 and IPv6 provision for slow neigh lookups for MSG_PROBE users.
I decided to use neigh lookup only for this case because on
MSG_PROBE the skb may pass MTU checks but it does not reach
the neigh confirmation code. This patch will be used from patch 6.

- xfrm_confirm_neigh: support is incomplete here, only routes with
known nexthops (gateway) are supported because the tunnel address
is slow to obtain. Or there is solution to this problem?

6. net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP

- dst_confirm conversion for UDP, RAW, ICMP and L2TP/IPv6

- these protocols use MSG_CONFIRM propagated by ip*_append_data
to skb->dst_pending_confirm. sk->sk_dst_pending_confirm is not
used because some sending paths do not lock the socket. For
MSG_PROBE we use the slow lookup (dst_confirm_neigh).

- there are also 2 cases that need the slow lookup:
__ip6_rt_update_pmtu and rt6_do_redirect. I hope
&ipv6_hdr(skb)->saddr is the correct nexthop address to use here.

7. net: pending_confirm is not used anymore

- I failed to understand the CXGB* code, I see dst_confirm()
calls but I'm not sure dst_neigh_output() was called. For now
I just removed the dst->pending_confirm flag and left all
dst_confirm() calls there. Any better idea?

- Now may be old function neigh_output() should be restored
instead of dst_neigh_output?


Julian Anastasov (7):
  sock: add sk_dst_pending_confirm flag
  net: add dst_pending_confirm flag to skbuff
  sctp: add dst_pending_confirm flag
  tcp: replace dst_confirm with sk_dst_confirm
  net: add confirm_neigh method to dst_ops
  net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP
  net: pending_confirm is not used anymore

 drivers/net/vrf.c          |  5 ++++-
 include/linux/skbuff.h     |  4 +++-
 include/net/arp.h          | 16 ++++++++++++++++
 include/net/dst.h          | 21 +++++++++------------
 include/net/dst_ops.h      |  2 ++
 include/net/ndisc.h        | 17 +++++++++++++++++
 include/net/sctp/sctp.h    |  6 ++----
 include/net/sctp/structs.h |  4 ++++
 include/net/sock.h         | 28 +++++++++++++++++++++++++++-
 net/core/dst.c             |  1 -
 net/core/sock.c            |  2 ++
 net/ipv4/ip_output.c       | 11 ++++++++++-
 net/ipv4/ping.c            |  3 ++-
 net/ipv4/raw.c             |  6 +++++-
 net/ipv4/route.c           | 19 +++++++++++++++++++
 net/ipv4/tcp_input.c       | 12 +++---------
 net/ipv4/tcp_metrics.c     |  7 ++-----
 net/ipv4/tcp_output.c      |  2 ++
 net/ipv4/udp.c             |  3 ++-
 net/ipv6/ip6_output.c      |  7 +++++++
 net/ipv6/raw.c             |  6 +++++-
 net/ipv6/route.c           | 43 ++++++++++++++++++++++++++++++-------------
 net/ipv6/udp.c             |  3 ++-
 net/l2tp/l2tp_ip6.c        |  3 ++-
 net/sctp/associola.c       |  3 +--
 net/sctp/output.c          | 10 +++++++++-
 net/sctp/outqueue.c        |  2 +-
 net/sctp/sm_make_chunk.c   |  6 ++----
 net/sctp/sm_sideeffect.c   |  2 +-
 net/sctp/socket.c          |  4 ++--
 net/sctp/transport.c       | 18 +++++++++++++++++-
 net/xfrm/xfrm_policy.c     | 16 ++++++++++++++++
 32 files changed, 226 insertions(+), 66 deletions(-)

-- 
1.9.3


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH RFC net-next 1/7] sock: add sk_dst_pending_confirm flag
  2016-12-18 20:56 ` Julian Anastasov
@ 2016-12-18 20:56   ` Julian Anastasov
  -1 siblings, 0 replies; 34+ messages in thread
From: Julian Anastasov @ 2016-12-18 20:56 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, YueHaibing

Add new sock flag to allow sockets to confirm neighbour.
When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.
As not all call paths lock the socket use full word for
the flag.

Add sk_dst_confirm as replacement for dst_confirm when
called for received packets.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 include/net/sock.h | 14 +++++++++++++-
 net/core/sock.c    |  2 ++
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 282d065..e83bb01 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -239,6 +239,7 @@ struct sock_common {
   *	@sk_wq: sock wait queue and async head
   *	@sk_rx_dst: receive input route used by early demux
   *	@sk_dst_cache: destination cache
+  *	@sk_dst_pending_confirm: need to confirm neighbour
   *	@sk_policy: flow policy
   *	@sk_receive_queue: incoming packets
   *	@sk_wmem_alloc: transmit queue bytes committed
@@ -381,7 +382,7 @@ struct sock {
 #endif
 	struct dst_entry	*sk_rx_dst;
 	struct dst_entry __rcu	*sk_dst_cache;
-	atomic_t		sk_omem_alloc;
+	int			sk_dst_pending_confirm;
 	int			sk_sndbuf;
 
 	/* ===== cache line for TX ===== */
@@ -405,6 +406,8 @@ struct sock {
 	unsigned int		sk_gso_max_size;
 	gfp_t			sk_allocation;
 	__u32			sk_txhash;
+	atomic_t		sk_omem_alloc;
+	/* Note: 32bit hole on 64bit arches */
 
 	/*
 	 * Because of non atomicity rules, all
@@ -1761,6 +1764,7 @@ static inline void dst_negative_advice(struct sock *sk)
 		if (ndst != dst) {
 			rcu_assign_pointer(sk->sk_dst_cache, ndst);
 			sk_tx_queue_clear(sk);
+			sk->sk_dst_pending_confirm = 0;
 		}
 	}
 }
@@ -1771,6 +1775,7 @@ static inline void dst_negative_advice(struct sock *sk)
 	struct dst_entry *old_dst;
 
 	sk_tx_queue_clear(sk);
+	sk->sk_dst_pending_confirm = 0;
 	/*
 	 * This can be called while sk is owned by the caller only,
 	 * with no state that can be checked in a rcu_dereference_check() cond
@@ -1786,6 +1791,7 @@ static inline void dst_negative_advice(struct sock *sk)
 	struct dst_entry *old_dst;
 
 	sk_tx_queue_clear(sk);
+	sk->sk_dst_pending_confirm = 0;
 	old_dst = xchg((__force struct dst_entry **)&sk->sk_dst_cache, dst);
 	dst_release(old_dst);
 }
@@ -1806,6 +1812,12 @@ static inline void dst_negative_advice(struct sock *sk)
 
 struct dst_entry *sk_dst_check(struct sock *sk, u32 cookie);
 
+static inline void sk_dst_confirm(struct sock *sk)
+{
+	if (!sk->sk_dst_pending_confirm)
+		sk->sk_dst_pending_confirm = 1;
+}
+
 bool sk_mc_loop(struct sock *sk);
 
 static inline bool sk_can_gso(const struct sock *sk)
diff --git a/net/core/sock.c b/net/core/sock.c
index 9fa46b9..8af5296 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -502,6 +502,7 @@ struct dst_entry *__sk_dst_check(struct sock *sk, u32 cookie)
 
 	if (dst && dst->obsolete && dst->ops->check(dst, cookie) == NULL) {
 		sk_tx_queue_clear(sk);
+		sk->sk_dst_pending_confirm = 0;
 		RCU_INIT_POINTER(sk->sk_dst_cache, NULL);
 		dst_release(dst);
 		return NULL;
@@ -1522,6 +1523,7 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 				af_family_clock_key_strings[newsk->sk_family]);
 
 		newsk->sk_dst_cache	= NULL;
+		newsk->sk_dst_pending_confirm = 0;
 		newsk->sk_wmem_queued	= 0;
 		newsk->sk_forward_alloc = 0;
 		atomic_set(&newsk->sk_drops, 0);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC net-next 1/7] sock: add sk_dst_pending_confirm flag
@ 2016-12-18 20:56   ` Julian Anastasov
  0 siblings, 0 replies; 34+ messages in thread
From: Julian Anastasov @ 2016-12-18 20:56 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, YueHaibing

Add new sock flag to allow sockets to confirm neighbour.
When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.
As not all call paths lock the socket use full word for
the flag.

Add sk_dst_confirm as replacement for dst_confirm when
called for received packets.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 include/net/sock.h | 14 +++++++++++++-
 net/core/sock.c    |  2 ++
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 282d065..e83bb01 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -239,6 +239,7 @@ struct sock_common {
   *	@sk_wq: sock wait queue and async head
   *	@sk_rx_dst: receive input route used by early demux
   *	@sk_dst_cache: destination cache
+  *	@sk_dst_pending_confirm: need to confirm neighbour
   *	@sk_policy: flow policy
   *	@sk_receive_queue: incoming packets
   *	@sk_wmem_alloc: transmit queue bytes committed
@@ -381,7 +382,7 @@ struct sock {
 #endif
 	struct dst_entry	*sk_rx_dst;
 	struct dst_entry __rcu	*sk_dst_cache;
-	atomic_t		sk_omem_alloc;
+	int			sk_dst_pending_confirm;
 	int			sk_sndbuf;
 
 	/* === cache line for TX === */
@@ -405,6 +406,8 @@ struct sock {
 	unsigned int		sk_gso_max_size;
 	gfp_t			sk_allocation;
 	__u32			sk_txhash;
+	atomic_t		sk_omem_alloc;
+	/* Note: 32bit hole on 64bit arches */
 
 	/*
 	 * Because of non atomicity rules, all
@@ -1761,6 +1764,7 @@ static inline void dst_negative_advice(struct sock *sk)
 		if (ndst != dst) {
 			rcu_assign_pointer(sk->sk_dst_cache, ndst);
 			sk_tx_queue_clear(sk);
+			sk->sk_dst_pending_confirm = 0;
 		}
 	}
 }
@@ -1771,6 +1775,7 @@ static inline void dst_negative_advice(struct sock *sk)
 	struct dst_entry *old_dst;
 
 	sk_tx_queue_clear(sk);
+	sk->sk_dst_pending_confirm = 0;
 	/*
 	 * This can be called while sk is owned by the caller only,
 	 * with no state that can be checked in a rcu_dereference_check() cond
@@ -1786,6 +1791,7 @@ static inline void dst_negative_advice(struct sock *sk)
 	struct dst_entry *old_dst;
 
 	sk_tx_queue_clear(sk);
+	sk->sk_dst_pending_confirm = 0;
 	old_dst = xchg((__force struct dst_entry **)&sk->sk_dst_cache, dst);
 	dst_release(old_dst);
 }
@@ -1806,6 +1812,12 @@ static inline void dst_negative_advice(struct sock *sk)
 
 struct dst_entry *sk_dst_check(struct sock *sk, u32 cookie);
 
+static inline void sk_dst_confirm(struct sock *sk)
+{
+	if (!sk->sk_dst_pending_confirm)
+		sk->sk_dst_pending_confirm = 1;
+}
+
 bool sk_mc_loop(struct sock *sk);
 
 static inline bool sk_can_gso(const struct sock *sk)
diff --git a/net/core/sock.c b/net/core/sock.c
index 9fa46b9..8af5296 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -502,6 +502,7 @@ struct dst_entry *__sk_dst_check(struct sock *sk, u32 cookie)
 
 	if (dst && dst->obsolete && dst->ops->check(dst, cookie) = NULL) {
 		sk_tx_queue_clear(sk);
+		sk->sk_dst_pending_confirm = 0;
 		RCU_INIT_POINTER(sk->sk_dst_cache, NULL);
 		dst_release(dst);
 		return NULL;
@@ -1522,6 +1523,7 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 				af_family_clock_key_strings[newsk->sk_family]);
 
 		newsk->sk_dst_cache	= NULL;
+		newsk->sk_dst_pending_confirm = 0;
 		newsk->sk_wmem_queued	= 0;
 		newsk->sk_forward_alloc = 0;
 		atomic_set(&newsk->sk_drops, 0);
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC net-next 2/7] net: add dst_pending_confirm flag to skbuff
  2016-12-18 20:56 ` Julian Anastasov
@ 2016-12-18 20:56   ` Julian Anastasov
  -1 siblings, 0 replies; 34+ messages in thread
From: Julian Anastasov @ 2016-12-18 20:56 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, YueHaibing

Add new skbuff flag to allow protocols to confirm neighbour.
When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.

Add sock_confirm_neigh() helper to confirm the neighbour and
use it for IPv4, IPv6 and VRF before dst_neigh_output.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 drivers/net/vrf.c      |  5 ++++-
 include/linux/skbuff.h |  4 +++-
 include/net/sock.h     | 14 ++++++++++++++
 net/ipv4/ip_output.c   |  5 ++++-
 net/ipv6/ip6_output.c  |  1 +
 5 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 7532646..b118d2b 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -377,6 +377,7 @@ static int vrf_finish_output6(struct net *net, struct sock *sk,
 	if (unlikely(!neigh))
 		neigh = __neigh_create(&nd_tbl, nexthop, dst->dev, false);
 	if (!IS_ERR(neigh)) {
+		sock_confirm_neigh(skb, neigh);
 		ret = dst_neigh_output(dst, neigh, skb);
 		rcu_read_unlock_bh();
 		return ret;
@@ -573,8 +574,10 @@ static int vrf_finish_output(struct net *net, struct sock *sk, struct sk_buff *s
 	neigh = __ipv4_neigh_lookup_noref(dev, nexthop);
 	if (unlikely(!neigh))
 		neigh = __neigh_create(&arp_tbl, &nexthop, dev, false);
-	if (!IS_ERR(neigh))
+	if (!IS_ERR(neigh)) {
+		sock_confirm_neigh(skb, neigh);
 		ret = dst_neigh_output(dst, neigh, skb);
+	}
 
 	rcu_read_unlock_bh();
 err:
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index ac7fa34..94d7c36 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -610,6 +610,7 @@ static inline bool skb_mstamp_after(const struct skb_mstamp *t1,
  *	@wifi_acked_valid: wifi_acked was set
  *	@wifi_acked: whether frame was acked on wifi or not
  *	@no_fcs:  Request NIC to treat last 4 bytes as Ethernet FCS
+ *	@dst_pending_confirm: need to confirm neighbour
   *	@napi_id: id of the NAPI struct this skb came from
  *	@secmark: security marking
  *	@mark: Generic packet mark
@@ -740,6 +741,7 @@ struct sk_buff {
 	__u8			csum_level:2;
 	__u8			csum_bad:1;
 
+	__u8			dst_pending_confirm:1;
 #ifdef CONFIG_IPV6_NDISC_NODETYPE
 	__u8			ndisc_nodetype:2;
 #endif
@@ -749,7 +751,7 @@ struct sk_buff {
 #ifdef CONFIG_NET_SWITCHDEV
 	__u8			offload_fwd_mark:1;
 #endif
-	/* 2, 4 or 5 bit hole */
+	/* 1, 3 or 4 bit hole */
 
 #ifdef CONFIG_NET_SCHED
 	__u16			tc_index;	/* traffic control index */
diff --git a/include/net/sock.h b/include/net/sock.h
index e83bb01..bd63d4d 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1818,6 +1818,20 @@ static inline void sk_dst_confirm(struct sock *sk)
 		sk->sk_dst_pending_confirm = 1;
 }
 
+static inline void sock_confirm_neigh(struct sk_buff *skb, struct neighbour *n)
+{
+	if (unlikely(skb->dst_pending_confirm)) {
+		struct sock *sk = skb->sk;
+		unsigned long now = jiffies;
+
+		/* avoid dirtying neighbour */
+		if (n->confirmed != now)
+			n->confirmed = now;
+		if (sk && sk->sk_dst_pending_confirm)
+			sk->sk_dst_pending_confirm = 0;
+	}
+}
+
 bool sk_mc_loop(struct sock *sk);
 
 static inline bool sk_can_gso(const struct sock *sk)
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 6c9615c..fbe63cc 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -222,7 +222,10 @@ static int ip_finish_output2(struct net *net, struct sock *sk, struct sk_buff *s
 	if (unlikely(!neigh))
 		neigh = __neigh_create(&arp_tbl, &nexthop, dev, false);
 	if (!IS_ERR(neigh)) {
-		int res = dst_neigh_output(dst, neigh, skb);
+		int res;
+
+		sock_confirm_neigh(skb, neigh);
+		res = dst_neigh_output(dst, neigh, skb);
 
 		rcu_read_unlock_bh();
 		return res;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 70d0de40..285aa9f 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -119,6 +119,7 @@ static int ip6_finish_output2(struct net *net, struct sock *sk, struct sk_buff *
 	if (unlikely(!neigh))
 		neigh = __neigh_create(&nd_tbl, nexthop, dst->dev, false);
 	if (!IS_ERR(neigh)) {
+		sock_confirm_neigh(skb, neigh);
 		ret = dst_neigh_output(dst, neigh, skb);
 		rcu_read_unlock_bh();
 		return ret;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC net-next 2/7] net: add dst_pending_confirm flag to skbuff
@ 2016-12-18 20:56   ` Julian Anastasov
  0 siblings, 0 replies; 34+ messages in thread
From: Julian Anastasov @ 2016-12-18 20:56 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, YueHaibing

Add new skbuff flag to allow protocols to confirm neighbour.
When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.

Add sock_confirm_neigh() helper to confirm the neighbour and
use it for IPv4, IPv6 and VRF before dst_neigh_output.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 drivers/net/vrf.c      |  5 ++++-
 include/linux/skbuff.h |  4 +++-
 include/net/sock.h     | 14 ++++++++++++++
 net/ipv4/ip_output.c   |  5 ++++-
 net/ipv6/ip6_output.c  |  1 +
 5 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 7532646..b118d2b 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -377,6 +377,7 @@ static int vrf_finish_output6(struct net *net, struct sock *sk,
 	if (unlikely(!neigh))
 		neigh = __neigh_create(&nd_tbl, nexthop, dst->dev, false);
 	if (!IS_ERR(neigh)) {
+		sock_confirm_neigh(skb, neigh);
 		ret = dst_neigh_output(dst, neigh, skb);
 		rcu_read_unlock_bh();
 		return ret;
@@ -573,8 +574,10 @@ static int vrf_finish_output(struct net *net, struct sock *sk, struct sk_buff *s
 	neigh = __ipv4_neigh_lookup_noref(dev, nexthop);
 	if (unlikely(!neigh))
 		neigh = __neigh_create(&arp_tbl, &nexthop, dev, false);
-	if (!IS_ERR(neigh))
+	if (!IS_ERR(neigh)) {
+		sock_confirm_neigh(skb, neigh);
 		ret = dst_neigh_output(dst, neigh, skb);
+	}
 
 	rcu_read_unlock_bh();
 err:
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index ac7fa34..94d7c36 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -610,6 +610,7 @@ static inline bool skb_mstamp_after(const struct skb_mstamp *t1,
  *	@wifi_acked_valid: wifi_acked was set
  *	@wifi_acked: whether frame was acked on wifi or not
  *	@no_fcs:  Request NIC to treat last 4 bytes as Ethernet FCS
+ *	@dst_pending_confirm: need to confirm neighbour
   *	@napi_id: id of the NAPI struct this skb came from
  *	@secmark: security marking
  *	@mark: Generic packet mark
@@ -740,6 +741,7 @@ struct sk_buff {
 	__u8			csum_level:2;
 	__u8			csum_bad:1;
 
+	__u8			dst_pending_confirm:1;
 #ifdef CONFIG_IPV6_NDISC_NODETYPE
 	__u8			ndisc_nodetype:2;
 #endif
@@ -749,7 +751,7 @@ struct sk_buff {
 #ifdef CONFIG_NET_SWITCHDEV
 	__u8			offload_fwd_mark:1;
 #endif
-	/* 2, 4 or 5 bit hole */
+	/* 1, 3 or 4 bit hole */
 
 #ifdef CONFIG_NET_SCHED
 	__u16			tc_index;	/* traffic control index */
diff --git a/include/net/sock.h b/include/net/sock.h
index e83bb01..bd63d4d 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1818,6 +1818,20 @@ static inline void sk_dst_confirm(struct sock *sk)
 		sk->sk_dst_pending_confirm = 1;
 }
 
+static inline void sock_confirm_neigh(struct sk_buff *skb, struct neighbour *n)
+{
+	if (unlikely(skb->dst_pending_confirm)) {
+		struct sock *sk = skb->sk;
+		unsigned long now = jiffies;
+
+		/* avoid dirtying neighbour */
+		if (n->confirmed != now)
+			n->confirmed = now;
+		if (sk && sk->sk_dst_pending_confirm)
+			sk->sk_dst_pending_confirm = 0;
+	}
+}
+
 bool sk_mc_loop(struct sock *sk);
 
 static inline bool sk_can_gso(const struct sock *sk)
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 6c9615c..fbe63cc 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -222,7 +222,10 @@ static int ip_finish_output2(struct net *net, struct sock *sk, struct sk_buff *s
 	if (unlikely(!neigh))
 		neigh = __neigh_create(&arp_tbl, &nexthop, dev, false);
 	if (!IS_ERR(neigh)) {
-		int res = dst_neigh_output(dst, neigh, skb);
+		int res;
+
+		sock_confirm_neigh(skb, neigh);
+		res = dst_neigh_output(dst, neigh, skb);
 
 		rcu_read_unlock_bh();
 		return res;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 70d0de40..285aa9f 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -119,6 +119,7 @@ static int ip6_finish_output2(struct net *net, struct sock *sk, struct sk_buff *
 	if (unlikely(!neigh))
 		neigh = __neigh_create(&nd_tbl, nexthop, dst->dev, false);
 	if (!IS_ERR(neigh)) {
+		sock_confirm_neigh(skb, neigh);
 		ret = dst_neigh_output(dst, neigh, skb);
 		rcu_read_unlock_bh();
 		return ret;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC net-next 3/7] sctp: add dst_pending_confirm flag
  2016-12-18 20:56 ` Julian Anastasov
@ 2016-12-18 20:56   ` Julian Anastasov
  -1 siblings, 0 replies; 34+ messages in thread
From: Julian Anastasov @ 2016-12-18 20:56 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, YueHaibing

Add new transport flag to allow sockets to confirm neighbour.
When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.
The flag is propagated from transport to every packet.
It is reset when cached dst is reset.

Reported-by: YueHaibing <yuehaibing@huawei.com>
Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.")
Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 include/net/sctp/sctp.h    |  6 ++----
 include/net/sctp/structs.h |  4 ++++
 net/sctp/associola.c       |  3 +--
 net/sctp/output.c          | 10 +++++++++-
 net/sctp/outqueue.c        |  2 +-
 net/sctp/sm_make_chunk.c   |  6 ++----
 net/sctp/sm_sideeffect.c   |  2 +-
 net/sctp/socket.c          |  4 ++--
 net/sctp/transport.c       | 18 +++++++++++++++++-
 9 files changed, 39 insertions(+), 16 deletions(-)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index f0dcaeb..85d9083 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -586,10 +586,8 @@ static inline void sctp_v4_map_v6(union sctp_addr *addr)
  */
 static inline struct dst_entry *sctp_transport_dst_check(struct sctp_transport *t)
 {
-	if (t->dst && !dst_check(t->dst, t->dst_cookie)) {
-		dst_release(t->dst);
-		t->dst = NULL;
-	}
+	if (t->dst && !dst_check(t->dst, t->dst_cookie))
+		sctp_transport_dst_release(t);
 
 	return t->dst;
 }
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 92daabd..e842e84 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -838,6 +838,8 @@ struct sctp_transport {
 
 	__u32 burst_limited;	/* Holds old cwnd when max.burst is applied */
 
+	__u32 dst_pending_confirm;	/* need to confirm neighbour */
+
 	/* Destination */
 	struct dst_entry *dst;
 	/* Source address. */
@@ -980,6 +982,8 @@ void sctp_transport_route(struct sctp_transport *, union sctp_addr *,
 void sctp_transport_reset(struct sctp_transport *);
 void sctp_transport_update_pmtu(struct sock *, struct sctp_transport *, u32);
 void sctp_transport_immediate_rtx(struct sctp_transport *);
+void sctp_transport_dst_release(struct sctp_transport *t);
+void sctp_transport_dst_confirm(struct sctp_transport *t);
 
 
 /* This is the structure we use to queue packets as they come into
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 68428e1..7bd26e0 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -820,8 +820,7 @@ void sctp_assoc_control_transport(struct sctp_association *asoc,
 		if (transport->state != SCTP_UNCONFIRMED)
 			transport->state = SCTP_INACTIVE;
 		else {
-			dst_release(transport->dst);
-			transport->dst = NULL;
+			sctp_transport_dst_release(transport);
 			ulp_notify = false;
 		}
 
diff --git a/net/sctp/output.c b/net/sctp/output.c
index f5320a8..4684a00 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -550,6 +550,7 @@ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t gfp)
 	struct sctp_association *asoc = tp->asoc;
 	struct sctp_chunk *chunk, *tmp;
 	int pkt_count, gso = 0;
+	int confirm;
 	struct dst_entry *dst;
 	struct sk_buff *head;
 	struct sctphdr *sh;
@@ -628,7 +629,14 @@ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t gfp)
 			asoc->peer.last_sent_to = tp;
 	}
 	head->ignore_df = packet->ipfragok;
-	tp->af_specific->sctp_xmit(head, tp);
+	confirm = tp->dst_pending_confirm;
+	if (confirm)
+		head->dst_pending_confirm = 1;
+	/* neighbour should be confirmed on successful transmission or
+	 * positive error
+	 */
+	if (tp->af_specific->sctp_xmit(head, tp) >= 0 && confirm)
+		tp->dst_pending_confirm = 0;
 
 out:
 	list_for_each_entry_safe(chunk, tmp, &packet->chunk_list, list) {
diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index e540826..8234799 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -1641,7 +1641,7 @@ static void sctp_check_transmitted(struct sctp_outq *q,
 
 		if (forward_progress) {
 			if (transport->dst)
-				dst_confirm(transport->dst);
+				sctp_transport_dst_confirm(transport);
 		}
 	}
 
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index 9e9690b..6fb15bf 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -3317,8 +3317,7 @@ static void sctp_asconf_param_success(struct sctp_association *asoc,
 		local_bh_enable();
 		list_for_each_entry(transport, &asoc->peer.transport_addr_list,
 				transports) {
-			dst_release(transport->dst);
-			transport->dst = NULL;
+			sctp_transport_dst_release(transport);
 		}
 		break;
 	case SCTP_PARAM_DEL_IP:
@@ -3332,8 +3331,7 @@ static void sctp_asconf_param_success(struct sctp_association *asoc,
 		local_bh_enable();
 		list_for_each_entry(transport, &asoc->peer.transport_addr_list,
 				transports) {
-			dst_release(transport->dst);
-			transport->dst = NULL;
+			sctp_transport_dst_release(transport);
 		}
 		break;
 	default:
diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index c345bf1..9255b29 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -723,7 +723,7 @@ static void sctp_cmd_transport_on(sctp_cmd_seq_t *cmds,
 	 * forward progress.
 	 */
 	if (t->dst)
-		dst_confirm(t->dst);
+		sctp_transport_dst_confirm(t);
 
 	/* The receiver of the HEARTBEAT ACK should also perform an
 	 * RTT measurement for that destination transport address
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 318c678..9ee676a 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -588,7 +588,7 @@ static int sctp_send_asconf_add_ip(struct sock		*sk,
 			list_for_each_entry(trans,
 			    &asoc->peer.transport_addr_list, transports) {
 				/* Clear the source and route cache */
-				dst_release(trans->dst);
+				sctp_transport_dst_release(trans);
 				trans->cwnd = min(4*asoc->pathmtu, max_t(__u32,
 				    2*asoc->pathmtu, 4380));
 				trans->ssthresh = asoc->peer.i.a_rwnd;
@@ -839,7 +839,7 @@ static int sctp_send_asconf_del_ip(struct sock		*sk,
 		 */
 		list_for_each_entry(transport, &asoc->peer.transport_addr_list,
 					transports) {
-			dst_release(transport->dst);
+			sctp_transport_dst_release(transport);
 			sctp_transport_route(transport, NULL,
 					     sctp_sk(asoc->base.sk));
 		}
diff --git a/net/sctp/transport.c b/net/sctp/transport.c
index ce54dce..db66514 100644
--- a/net/sctp/transport.c
+++ b/net/sctp/transport.c
@@ -227,7 +227,7 @@ void sctp_transport_pmtu(struct sctp_transport *transport, struct sock *sk)
 {
 	/* If we don't have a fresh route, look one up */
 	if (!transport->dst || transport->dst->obsolete) {
-		dst_release(transport->dst);
+		sctp_transport_dst_release(transport);
 		transport->af_specific->get_dst(transport, &transport->saddr,
 						&transport->fl, sk);
 	}
@@ -659,3 +659,19 @@ void sctp_transport_immediate_rtx(struct sctp_transport *t)
 			sctp_transport_hold(t);
 	}
 }
+
+/* Drop dst */
+void sctp_transport_dst_release(struct sctp_transport *t)
+{
+	dst_release(t->dst);
+	t->dst = NULL;
+	t->dst_pending_confirm = 0;
+}
+
+/* Schedule neighbour confirm */
+void sctp_transport_dst_confirm(struct sctp_transport *t)
+{
+	if (!t->dst_pending_confirm)
+		t->dst_pending_confirm = 1;
+}
+
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC net-next 3/7] sctp: add dst_pending_confirm flag
@ 2016-12-18 20:56   ` Julian Anastasov
  0 siblings, 0 replies; 34+ messages in thread
From: Julian Anastasov @ 2016-12-18 20:56 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, YueHaibing

Add new transport flag to allow sockets to confirm neighbour.
When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.
The flag is propagated from transport to every packet.
It is reset when cached dst is reset.

Reported-by: YueHaibing <yuehaibing@huawei.com>
Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.")
Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 include/net/sctp/sctp.h    |  6 ++----
 include/net/sctp/structs.h |  4 ++++
 net/sctp/associola.c       |  3 +--
 net/sctp/output.c          | 10 +++++++++-
 net/sctp/outqueue.c        |  2 +-
 net/sctp/sm_make_chunk.c   |  6 ++----
 net/sctp/sm_sideeffect.c   |  2 +-
 net/sctp/socket.c          |  4 ++--
 net/sctp/transport.c       | 18 +++++++++++++++++-
 9 files changed, 39 insertions(+), 16 deletions(-)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index f0dcaeb..85d9083 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -586,10 +586,8 @@ static inline void sctp_v4_map_v6(union sctp_addr *addr)
  */
 static inline struct dst_entry *sctp_transport_dst_check(struct sctp_transport *t)
 {
-	if (t->dst && !dst_check(t->dst, t->dst_cookie)) {
-		dst_release(t->dst);
-		t->dst = NULL;
-	}
+	if (t->dst && !dst_check(t->dst, t->dst_cookie))
+		sctp_transport_dst_release(t);
 
 	return t->dst;
 }
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 92daabd..e842e84 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -838,6 +838,8 @@ struct sctp_transport {
 
 	__u32 burst_limited;	/* Holds old cwnd when max.burst is applied */
 
+	__u32 dst_pending_confirm;	/* need to confirm neighbour */
+
 	/* Destination */
 	struct dst_entry *dst;
 	/* Source address. */
@@ -980,6 +982,8 @@ void sctp_transport_route(struct sctp_transport *, union sctp_addr *,
 void sctp_transport_reset(struct sctp_transport *);
 void sctp_transport_update_pmtu(struct sock *, struct sctp_transport *, u32);
 void sctp_transport_immediate_rtx(struct sctp_transport *);
+void sctp_transport_dst_release(struct sctp_transport *t);
+void sctp_transport_dst_confirm(struct sctp_transport *t);
 
 
 /* This is the structure we use to queue packets as they come into
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 68428e1..7bd26e0 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -820,8 +820,7 @@ void sctp_assoc_control_transport(struct sctp_association *asoc,
 		if (transport->state != SCTP_UNCONFIRMED)
 			transport->state = SCTP_INACTIVE;
 		else {
-			dst_release(transport->dst);
-			transport->dst = NULL;
+			sctp_transport_dst_release(transport);
 			ulp_notify = false;
 		}
 
diff --git a/net/sctp/output.c b/net/sctp/output.c
index f5320a8..4684a00 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -550,6 +550,7 @@ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t gfp)
 	struct sctp_association *asoc = tp->asoc;
 	struct sctp_chunk *chunk, *tmp;
 	int pkt_count, gso = 0;
+	int confirm;
 	struct dst_entry *dst;
 	struct sk_buff *head;
 	struct sctphdr *sh;
@@ -628,7 +629,14 @@ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t gfp)
 			asoc->peer.last_sent_to = tp;
 	}
 	head->ignore_df = packet->ipfragok;
-	tp->af_specific->sctp_xmit(head, tp);
+	confirm = tp->dst_pending_confirm;
+	if (confirm)
+		head->dst_pending_confirm = 1;
+	/* neighbour should be confirmed on successful transmission or
+	 * positive error
+	 */
+	if (tp->af_specific->sctp_xmit(head, tp) >= 0 && confirm)
+		tp->dst_pending_confirm = 0;
 
 out:
 	list_for_each_entry_safe(chunk, tmp, &packet->chunk_list, list) {
diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index e540826..8234799 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -1641,7 +1641,7 @@ static void sctp_check_transmitted(struct sctp_outq *q,
 
 		if (forward_progress) {
 			if (transport->dst)
-				dst_confirm(transport->dst);
+				sctp_transport_dst_confirm(transport);
 		}
 	}
 
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index 9e9690b..6fb15bf 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -3317,8 +3317,7 @@ static void sctp_asconf_param_success(struct sctp_association *asoc,
 		local_bh_enable();
 		list_for_each_entry(transport, &asoc->peer.transport_addr_list,
 				transports) {
-			dst_release(transport->dst);
-			transport->dst = NULL;
+			sctp_transport_dst_release(transport);
 		}
 		break;
 	case SCTP_PARAM_DEL_IP:
@@ -3332,8 +3331,7 @@ static void sctp_asconf_param_success(struct sctp_association *asoc,
 		local_bh_enable();
 		list_for_each_entry(transport, &asoc->peer.transport_addr_list,
 				transports) {
-			dst_release(transport->dst);
-			transport->dst = NULL;
+			sctp_transport_dst_release(transport);
 		}
 		break;
 	default:
diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index c345bf1..9255b29 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -723,7 +723,7 @@ static void sctp_cmd_transport_on(sctp_cmd_seq_t *cmds,
 	 * forward progress.
 	 */
 	if (t->dst)
-		dst_confirm(t->dst);
+		sctp_transport_dst_confirm(t);
 
 	/* The receiver of the HEARTBEAT ACK should also perform an
 	 * RTT measurement for that destination transport address
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 318c678..9ee676a 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -588,7 +588,7 @@ static int sctp_send_asconf_add_ip(struct sock		*sk,
 			list_for_each_entry(trans,
 			    &asoc->peer.transport_addr_list, transports) {
 				/* Clear the source and route cache */
-				dst_release(trans->dst);
+				sctp_transport_dst_release(trans);
 				trans->cwnd = min(4*asoc->pathmtu, max_t(__u32,
 				    2*asoc->pathmtu, 4380));
 				trans->ssthresh = asoc->peer.i.a_rwnd;
@@ -839,7 +839,7 @@ static int sctp_send_asconf_del_ip(struct sock		*sk,
 		 */
 		list_for_each_entry(transport, &asoc->peer.transport_addr_list,
 					transports) {
-			dst_release(transport->dst);
+			sctp_transport_dst_release(transport);
 			sctp_transport_route(transport, NULL,
 					     sctp_sk(asoc->base.sk));
 		}
diff --git a/net/sctp/transport.c b/net/sctp/transport.c
index ce54dce..db66514 100644
--- a/net/sctp/transport.c
+++ b/net/sctp/transport.c
@@ -227,7 +227,7 @@ void sctp_transport_pmtu(struct sctp_transport *transport, struct sock *sk)
 {
 	/* If we don't have a fresh route, look one up */
 	if (!transport->dst || transport->dst->obsolete) {
-		dst_release(transport->dst);
+		sctp_transport_dst_release(transport);
 		transport->af_specific->get_dst(transport, &transport->saddr,
 						&transport->fl, sk);
 	}
@@ -659,3 +659,19 @@ void sctp_transport_immediate_rtx(struct sctp_transport *t)
 			sctp_transport_hold(t);
 	}
 }
+
+/* Drop dst */
+void sctp_transport_dst_release(struct sctp_transport *t)
+{
+	dst_release(t->dst);
+	t->dst = NULL;
+	t->dst_pending_confirm = 0;
+}
+
+/* Schedule neighbour confirm */
+void sctp_transport_dst_confirm(struct sctp_transport *t)
+{
+	if (!t->dst_pending_confirm)
+		t->dst_pending_confirm = 1;
+}
+
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC net-next 4/7] tcp: replace dst_confirm with sk_dst_confirm
  2016-12-18 20:56 ` Julian Anastasov
@ 2016-12-18 20:56   ` Julian Anastasov
  -1 siblings, 0 replies; 34+ messages in thread
From: Julian Anastasov @ 2016-12-18 20:56 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, YueHaibing

When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.
Use the new sk_dst_confirm() helper to propagate the
indication from received packets to sock_confirm_neigh().

Reported-by: YueHaibing <yuehaibing@huawei.com>
Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.")
Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 net/ipv4/tcp_input.c   | 12 +++---------
 net/ipv4/tcp_metrics.c |  7 ++-----
 net/ipv4/tcp_output.c  |  2 ++
 3 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 6c79075..005c428 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3698,11 +3698,8 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 	if (tp->tlp_high_seq)
 		tcp_process_tlp_ack(sk, ack, flag);
 
-	if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP)) {
-		struct dst_entry *dst = __sk_dst_get(sk);
-		if (dst)
-			dst_confirm(dst);
-	}
+	if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP))
+		sk_dst_confirm(sk);
 
 	if (icsk->icsk_pending == ICSK_TIME_RETRANS)
 		tcp_schedule_loss_probe(sk);
@@ -6022,7 +6019,6 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
 		break;
 
 	case TCP_FIN_WAIT1: {
-		struct dst_entry *dst;
 		int tmo;
 
 		/* If we enter the TCP_FIN_WAIT1 state and we are a
@@ -6049,9 +6045,7 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
 		tcp_set_state(sk, TCP_FIN_WAIT2);
 		sk->sk_shutdown |= SEND_SHUTDOWN;
 
-		dst = __sk_dst_get(sk);
-		if (dst)
-			dst_confirm(dst);
+		sk_dst_confirm(sk);
 
 		if (!sock_flag(sk, SOCK_DEAD)) {
 			/* Wake up lingering close() */
diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
index d46f4d5..1b335d6 100644
--- a/net/ipv4/tcp_metrics.c
+++ b/net/ipv4/tcp_metrics.c
@@ -375,12 +375,10 @@ void tcp_update_metrics(struct sock *sk)
 	u32 val;
 	int m;
 
+	sk_dst_confirm(sk);
 	if (sysctl_tcp_nometrics_save || !dst)
 		return;
 
-	if (dst->flags & DST_HOST)
-		dst_confirm(dst);
-
 	rcu_read_lock();
 	if (icsk->icsk_backoff || !tp->srtt_us) {
 		/* This session failed to estimate rtt. Why?
@@ -493,11 +491,10 @@ void tcp_init_metrics(struct sock *sk)
 	struct tcp_metrics_block *tm;
 	u32 val, crtt = 0; /* cached RTT scaled by 8 */
 
+	sk_dst_confirm(sk);
 	if (!dst)
 		goto reset;
 
-	dst_confirm(dst);
-
 	rcu_read_lock();
 	tm = tcp_get_metrics(sk, dst, true);
 	if (!tm) {
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index b45101f..dc01f35 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -974,6 +974,8 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
 	skb_set_hash_from_sk(skb, sk);
 	atomic_add(skb->truesize, &sk->sk_wmem_alloc);
 
+	skb->dst_pending_confirm = sk->sk_dst_pending_confirm;
+
 	/* Build TCP header and checksum it. */
 	th = (struct tcphdr *)skb->data;
 	th->source		= inet->inet_sport;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC net-next 4/7] tcp: replace dst_confirm with sk_dst_confirm
@ 2016-12-18 20:56   ` Julian Anastasov
  0 siblings, 0 replies; 34+ messages in thread
From: Julian Anastasov @ 2016-12-18 20:56 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, YueHaibing

When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.
Use the new sk_dst_confirm() helper to propagate the
indication from received packets to sock_confirm_neigh().

Reported-by: YueHaibing <yuehaibing@huawei.com>
Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.")
Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 net/ipv4/tcp_input.c   | 12 +++---------
 net/ipv4/tcp_metrics.c |  7 ++-----
 net/ipv4/tcp_output.c  |  2 ++
 3 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 6c79075..005c428 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3698,11 +3698,8 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 	if (tp->tlp_high_seq)
 		tcp_process_tlp_ack(sk, ack, flag);
 
-	if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP)) {
-		struct dst_entry *dst = __sk_dst_get(sk);
-		if (dst)
-			dst_confirm(dst);
-	}
+	if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP))
+		sk_dst_confirm(sk);
 
 	if (icsk->icsk_pending = ICSK_TIME_RETRANS)
 		tcp_schedule_loss_probe(sk);
@@ -6022,7 +6019,6 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
 		break;
 
 	case TCP_FIN_WAIT1: {
-		struct dst_entry *dst;
 		int tmo;
 
 		/* If we enter the TCP_FIN_WAIT1 state and we are a
@@ -6049,9 +6045,7 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
 		tcp_set_state(sk, TCP_FIN_WAIT2);
 		sk->sk_shutdown |= SEND_SHUTDOWN;
 
-		dst = __sk_dst_get(sk);
-		if (dst)
-			dst_confirm(dst);
+		sk_dst_confirm(sk);
 
 		if (!sock_flag(sk, SOCK_DEAD)) {
 			/* Wake up lingering close() */
diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
index d46f4d5..1b335d6 100644
--- a/net/ipv4/tcp_metrics.c
+++ b/net/ipv4/tcp_metrics.c
@@ -375,12 +375,10 @@ void tcp_update_metrics(struct sock *sk)
 	u32 val;
 	int m;
 
+	sk_dst_confirm(sk);
 	if (sysctl_tcp_nometrics_save || !dst)
 		return;
 
-	if (dst->flags & DST_HOST)
-		dst_confirm(dst);
-
 	rcu_read_lock();
 	if (icsk->icsk_backoff || !tp->srtt_us) {
 		/* This session failed to estimate rtt. Why?
@@ -493,11 +491,10 @@ void tcp_init_metrics(struct sock *sk)
 	struct tcp_metrics_block *tm;
 	u32 val, crtt = 0; /* cached RTT scaled by 8 */
 
+	sk_dst_confirm(sk);
 	if (!dst)
 		goto reset;
 
-	dst_confirm(dst);
-
 	rcu_read_lock();
 	tm = tcp_get_metrics(sk, dst, true);
 	if (!tm) {
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index b45101f..dc01f35 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -974,6 +974,8 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
 	skb_set_hash_from_sk(skb, sk);
 	atomic_add(skb->truesize, &sk->sk_wmem_alloc);
 
+	skb->dst_pending_confirm = sk->sk_dst_pending_confirm;
+
 	/* Build TCP header and checksum it. */
 	th = (struct tcphdr *)skb->data;
 	th->source		= inet->inet_sport;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC net-next 5/7] net: add confirm_neigh method to dst_ops
  2016-12-18 20:56 ` Julian Anastasov
@ 2016-12-18 20:56   ` Julian Anastasov
  -1 siblings, 0 replies; 34+ messages in thread
From: Julian Anastasov @ 2016-12-18 20:56 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, YueHaibing

Add confirm_neigh method to dst_ops and use it from IPv4 and IPv6
to lookup and confirm the neighbour. Its usage via the new helper
dst_confirm_neigh() should be restricted to MSG_PROBE users for
performance reasons.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 include/net/arp.h      | 16 ++++++++++++++++
 include/net/dst.h      |  7 +++++++
 include/net/dst_ops.h  |  2 ++
 include/net/ndisc.h    | 17 +++++++++++++++++
 net/ipv4/route.c       | 19 +++++++++++++++++++
 net/ipv6/route.c       | 16 ++++++++++++++++
 net/xfrm/xfrm_policy.c | 16 ++++++++++++++++
 7 files changed, 93 insertions(+)

diff --git a/include/net/arp.h b/include/net/arp.h
index 5e0f891..65619a2 100644
--- a/include/net/arp.h
+++ b/include/net/arp.h
@@ -35,6 +35,22 @@ static inline struct neighbour *__ipv4_neigh_lookup(struct net_device *dev, u32
 	return n;
 }
 
+static inline void __ipv4_confirm_neigh(struct net_device *dev, u32 key)
+{
+	struct neighbour *n;
+
+	rcu_read_lock_bh();
+	n = __ipv4_neigh_lookup_noref(dev, key);
+	if (n) {
+		unsigned long now = jiffies;
+
+		/* avoid dirtying neighbour */
+		if (n->confirmed != now)
+			n->confirmed = now;
+	}
+	rcu_read_unlock_bh();
+}
+
 void arp_init(void);
 int arp_ioctl(struct net *net, unsigned int cmd, void __user *arg);
 void arp_send(int type, int ptype, __be32 dest_ip,
diff --git a/include/net/dst.h b/include/net/dst.h
index 6835d22..3a3b34b 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -477,6 +477,13 @@ static inline struct neighbour *dst_neigh_lookup_skb(const struct dst_entry *dst
 	return IS_ERR(n) ? NULL : n;
 }
 
+static inline void dst_confirm_neigh(const struct dst_entry *dst,
+				     const void *daddr)
+{
+	if (dst->ops->confirm_neigh)
+		dst->ops->confirm_neigh(dst, daddr);
+}
+
 static inline void dst_link_failure(struct sk_buff *skb)
 {
 	struct dst_entry *dst = skb_dst(skb);
diff --git a/include/net/dst_ops.h b/include/net/dst_ops.h
index a0d443c..fc1bdb4 100644
--- a/include/net/dst_ops.h
+++ b/include/net/dst_ops.h
@@ -33,6 +33,8 @@ struct dst_ops {
 	struct neighbour *	(*neigh_lookup)(const struct dst_entry *dst,
 						struct sk_buff *skb,
 						const void *daddr);
+	void			(*confirm_neigh)(const struct dst_entry *dst,
+						const void *daddr);
 
 	struct kmem_cache	*kmem_cachep;
 
diff --git a/include/net/ndisc.h b/include/net/ndisc.h
index d562a2f..8a02146 100644
--- a/include/net/ndisc.h
+++ b/include/net/ndisc.h
@@ -391,6 +391,23 @@ static inline struct neighbour *__ipv6_neigh_lookup(struct net_device *dev, cons
 	return n;
 }
 
+static inline void __ipv6_confirm_neigh(struct net_device *dev,
+					const void *pkey)
+{
+	struct neighbour *n;
+
+	rcu_read_lock_bh();
+	n = __ipv6_neigh_lookup_noref(dev, pkey);
+	if (n) {
+		unsigned long now = jiffies;
+
+		/* avoid dirtying neighbour */
+		if (n->confirmed != now)
+			n->confirmed = now;
+	}
+	rcu_read_unlock_bh();
+}
+
 int ndisc_init(void);
 int ndisc_late_init(void);
 
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index fa5c037..682ff24 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -154,6 +154,7 @@ static u32 *ipv4_cow_metrics(struct dst_entry *dst, unsigned long old)
 static struct neighbour *ipv4_neigh_lookup(const struct dst_entry *dst,
 					   struct sk_buff *skb,
 					   const void *daddr);
+static void ipv4_confirm_neigh(const struct dst_entry *dst, const void *daddr);
 
 static struct dst_ops ipv4_dst_ops = {
 	.family =		AF_INET,
@@ -168,6 +169,7 @@ static struct neighbour *ipv4_neigh_lookup(const struct dst_entry *dst,
 	.redirect =		ip_do_redirect,
 	.local_out =		__ip_local_out,
 	.neigh_lookup =		ipv4_neigh_lookup,
+	.confirm_neigh =	ipv4_confirm_neigh,
 };
 
 #define ECN_OR_COST(class)	TC_PRIO_##class
@@ -461,6 +463,23 @@ static struct neighbour *ipv4_neigh_lookup(const struct dst_entry *dst,
 	return neigh_create(&arp_tbl, pkey, dev);
 }
 
+static void ipv4_confirm_neigh(const struct dst_entry *dst, const void *daddr)
+{
+	struct net_device *dev = dst->dev;
+	const __be32 *pkey = daddr;
+	const struct rtable *rt;
+
+	rt = (const struct rtable *)dst;
+	if (rt->rt_gateway)
+		pkey = (const __be32 *)&rt->rt_gateway;
+	else if (!daddr ||
+		 (rt->rt_flags &
+		  (RTCF_MULTICAST | RTCF_BROADCAST | RTCF_LOCAL)))
+		return;
+
+	__ipv4_confirm_neigh(dev, *(__force u32 *)pkey);
+}
+
 #define IP_IDENTS_SZ 2048u
 
 static atomic_t *ip_idents __read_mostly;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 890acac..4056d0e 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -217,6 +217,21 @@ static struct neighbour *ip6_neigh_lookup(const struct dst_entry *dst,
 	return neigh_create(&nd_tbl, daddr, dst->dev);
 }
 
+static void ip6_confirm_neigh(const struct dst_entry *dst, const void *daddr)
+{
+	struct net_device *dev = dst->dev;
+	struct rt6_info *rt = (struct rt6_info *)dst;
+
+	daddr = choose_neigh_daddr(rt, NULL, daddr);
+	if (!daddr)
+		return;
+	if (dev->flags & (IFF_NOARP | IFF_LOOPBACK))
+		return;
+	if (ipv6_addr_is_multicast((const struct in6_addr *)daddr))
+		return;
+	__ipv6_confirm_neigh(dev, daddr);
+}
+
 static struct dst_ops ip6_dst_ops_template = {
 	.family			=	AF_INET6,
 	.gc			=	ip6_dst_gc,
@@ -233,6 +248,7 @@ static struct neighbour *ip6_neigh_lookup(const struct dst_entry *dst,
 	.redirect		=	rt6_do_redirect,
 	.local_out		=	__ip6_local_out,
 	.neigh_lookup		=	ip6_neigh_lookup,
+	.confirm_neigh		=	ip6_confirm_neigh,
 };
 
 static unsigned int ip6_blackhole_mtu(const struct dst_entry *dst)
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 177e208..c010ee0 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -2856,6 +2856,20 @@ static struct neighbour *xfrm_neigh_lookup(const struct dst_entry *dst,
 	return dst->path->ops->neigh_lookup(dst, skb, daddr);
 }
 
+static void xfrm_confirm_neigh(const struct dst_entry *dst, const void *daddr)
+{
+	const struct dst_entry *path = dst->path;
+
+	if (path == dst) {
+		dst->ops->confirm_neigh(dst, daddr);
+	} else {
+		/* daddr can be from different family and we need the
+		 * tunnel address. How to get it?
+		 */
+		path->ops->confirm_neigh(path, NULL);
+	}
+}
+
 int xfrm_policy_register_afinfo(struct xfrm_policy_afinfo *afinfo)
 {
 	int err = 0;
@@ -2882,6 +2896,8 @@ int xfrm_policy_register_afinfo(struct xfrm_policy_afinfo *afinfo)
 			dst_ops->link_failure = xfrm_link_failure;
 		if (likely(dst_ops->neigh_lookup == NULL))
 			dst_ops->neigh_lookup = xfrm_neigh_lookup;
+		if (likely(!dst_ops->confirm_neigh))
+			dst_ops->confirm_neigh = xfrm_confirm_neigh;
 		if (likely(afinfo->garbage_collect == NULL))
 			afinfo->garbage_collect = xfrm_garbage_collect_deferred;
 		rcu_assign_pointer(xfrm_policy_afinfo[afinfo->family], afinfo);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC net-next 5/7] net: add confirm_neigh method to dst_ops
@ 2016-12-18 20:56   ` Julian Anastasov
  0 siblings, 0 replies; 34+ messages in thread
From: Julian Anastasov @ 2016-12-18 20:56 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, YueHaibing

Add confirm_neigh method to dst_ops and use it from IPv4 and IPv6
to lookup and confirm the neighbour. Its usage via the new helper
dst_confirm_neigh() should be restricted to MSG_PROBE users for
performance reasons.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 include/net/arp.h      | 16 ++++++++++++++++
 include/net/dst.h      |  7 +++++++
 include/net/dst_ops.h  |  2 ++
 include/net/ndisc.h    | 17 +++++++++++++++++
 net/ipv4/route.c       | 19 +++++++++++++++++++
 net/ipv6/route.c       | 16 ++++++++++++++++
 net/xfrm/xfrm_policy.c | 16 ++++++++++++++++
 7 files changed, 93 insertions(+)

diff --git a/include/net/arp.h b/include/net/arp.h
index 5e0f891..65619a2 100644
--- a/include/net/arp.h
+++ b/include/net/arp.h
@@ -35,6 +35,22 @@ static inline struct neighbour *__ipv4_neigh_lookup(struct net_device *dev, u32
 	return n;
 }
 
+static inline void __ipv4_confirm_neigh(struct net_device *dev, u32 key)
+{
+	struct neighbour *n;
+
+	rcu_read_lock_bh();
+	n = __ipv4_neigh_lookup_noref(dev, key);
+	if (n) {
+		unsigned long now = jiffies;
+
+		/* avoid dirtying neighbour */
+		if (n->confirmed != now)
+			n->confirmed = now;
+	}
+	rcu_read_unlock_bh();
+}
+
 void arp_init(void);
 int arp_ioctl(struct net *net, unsigned int cmd, void __user *arg);
 void arp_send(int type, int ptype, __be32 dest_ip,
diff --git a/include/net/dst.h b/include/net/dst.h
index 6835d22..3a3b34b 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -477,6 +477,13 @@ static inline struct neighbour *dst_neigh_lookup_skb(const struct dst_entry *dst
 	return IS_ERR(n) ? NULL : n;
 }
 
+static inline void dst_confirm_neigh(const struct dst_entry *dst,
+				     const void *daddr)
+{
+	if (dst->ops->confirm_neigh)
+		dst->ops->confirm_neigh(dst, daddr);
+}
+
 static inline void dst_link_failure(struct sk_buff *skb)
 {
 	struct dst_entry *dst = skb_dst(skb);
diff --git a/include/net/dst_ops.h b/include/net/dst_ops.h
index a0d443c..fc1bdb4 100644
--- a/include/net/dst_ops.h
+++ b/include/net/dst_ops.h
@@ -33,6 +33,8 @@ struct dst_ops {
 	struct neighbour *	(*neigh_lookup)(const struct dst_entry *dst,
 						struct sk_buff *skb,
 						const void *daddr);
+	void			(*confirm_neigh)(const struct dst_entry *dst,
+						const void *daddr);
 
 	struct kmem_cache	*kmem_cachep;
 
diff --git a/include/net/ndisc.h b/include/net/ndisc.h
index d562a2f..8a02146 100644
--- a/include/net/ndisc.h
+++ b/include/net/ndisc.h
@@ -391,6 +391,23 @@ static inline struct neighbour *__ipv6_neigh_lookup(struct net_device *dev, cons
 	return n;
 }
 
+static inline void __ipv6_confirm_neigh(struct net_device *dev,
+					const void *pkey)
+{
+	struct neighbour *n;
+
+	rcu_read_lock_bh();
+	n = __ipv6_neigh_lookup_noref(dev, pkey);
+	if (n) {
+		unsigned long now = jiffies;
+
+		/* avoid dirtying neighbour */
+		if (n->confirmed != now)
+			n->confirmed = now;
+	}
+	rcu_read_unlock_bh();
+}
+
 int ndisc_init(void);
 int ndisc_late_init(void);
 
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index fa5c037..682ff24 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -154,6 +154,7 @@ static u32 *ipv4_cow_metrics(struct dst_entry *dst, unsigned long old)
 static struct neighbour *ipv4_neigh_lookup(const struct dst_entry *dst,
 					   struct sk_buff *skb,
 					   const void *daddr);
+static void ipv4_confirm_neigh(const struct dst_entry *dst, const void *daddr);
 
 static struct dst_ops ipv4_dst_ops = {
 	.family =		AF_INET,
@@ -168,6 +169,7 @@ static struct neighbour *ipv4_neigh_lookup(const struct dst_entry *dst,
 	.redirect =		ip_do_redirect,
 	.local_out =		__ip_local_out,
 	.neigh_lookup =		ipv4_neigh_lookup,
+	.confirm_neigh =	ipv4_confirm_neigh,
 };
 
 #define ECN_OR_COST(class)	TC_PRIO_##class
@@ -461,6 +463,23 @@ static struct neighbour *ipv4_neigh_lookup(const struct dst_entry *dst,
 	return neigh_create(&arp_tbl, pkey, dev);
 }
 
+static void ipv4_confirm_neigh(const struct dst_entry *dst, const void *daddr)
+{
+	struct net_device *dev = dst->dev;
+	const __be32 *pkey = daddr;
+	const struct rtable *rt;
+
+	rt = (const struct rtable *)dst;
+	if (rt->rt_gateway)
+		pkey = (const __be32 *)&rt->rt_gateway;
+	else if (!daddr ||
+		 (rt->rt_flags &
+		  (RTCF_MULTICAST | RTCF_BROADCAST | RTCF_LOCAL)))
+		return;
+
+	__ipv4_confirm_neigh(dev, *(__force u32 *)pkey);
+}
+
 #define IP_IDENTS_SZ 2048u
 
 static atomic_t *ip_idents __read_mostly;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 890acac..4056d0e 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -217,6 +217,21 @@ static struct neighbour *ip6_neigh_lookup(const struct dst_entry *dst,
 	return neigh_create(&nd_tbl, daddr, dst->dev);
 }
 
+static void ip6_confirm_neigh(const struct dst_entry *dst, const void *daddr)
+{
+	struct net_device *dev = dst->dev;
+	struct rt6_info *rt = (struct rt6_info *)dst;
+
+	daddr = choose_neigh_daddr(rt, NULL, daddr);
+	if (!daddr)
+		return;
+	if (dev->flags & (IFF_NOARP | IFF_LOOPBACK))
+		return;
+	if (ipv6_addr_is_multicast((const struct in6_addr *)daddr))
+		return;
+	__ipv6_confirm_neigh(dev, daddr);
+}
+
 static struct dst_ops ip6_dst_ops_template = {
 	.family			=	AF_INET6,
 	.gc			=	ip6_dst_gc,
@@ -233,6 +248,7 @@ static struct neighbour *ip6_neigh_lookup(const struct dst_entry *dst,
 	.redirect		=	rt6_do_redirect,
 	.local_out		=	__ip6_local_out,
 	.neigh_lookup		=	ip6_neigh_lookup,
+	.confirm_neigh		=	ip6_confirm_neigh,
 };
 
 static unsigned int ip6_blackhole_mtu(const struct dst_entry *dst)
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 177e208..c010ee0 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -2856,6 +2856,20 @@ static struct neighbour *xfrm_neigh_lookup(const struct dst_entry *dst,
 	return dst->path->ops->neigh_lookup(dst, skb, daddr);
 }
 
+static void xfrm_confirm_neigh(const struct dst_entry *dst, const void *daddr)
+{
+	const struct dst_entry *path = dst->path;
+
+	if (path = dst) {
+		dst->ops->confirm_neigh(dst, daddr);
+	} else {
+		/* daddr can be from different family and we need the
+		 * tunnel address. How to get it?
+		 */
+		path->ops->confirm_neigh(path, NULL);
+	}
+}
+
 int xfrm_policy_register_afinfo(struct xfrm_policy_afinfo *afinfo)
 {
 	int err = 0;
@@ -2882,6 +2896,8 @@ int xfrm_policy_register_afinfo(struct xfrm_policy_afinfo *afinfo)
 			dst_ops->link_failure = xfrm_link_failure;
 		if (likely(dst_ops->neigh_lookup = NULL))
 			dst_ops->neigh_lookup = xfrm_neigh_lookup;
+		if (likely(!dst_ops->confirm_neigh))
+			dst_ops->confirm_neigh = xfrm_confirm_neigh;
 		if (likely(afinfo->garbage_collect = NULL))
 			afinfo->garbage_collect = xfrm_garbage_collect_deferred;
 		rcu_assign_pointer(xfrm_policy_afinfo[afinfo->family], afinfo);
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC net-next 6/7] net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP
  2016-12-18 20:56 ` Julian Anastasov
@ 2016-12-18 20:56   ` Julian Anastasov
  -1 siblings, 0 replies; 34+ messages in thread
From: Julian Anastasov @ 2016-12-18 20:56 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, YueHaibing

When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.

The datagram protocols can use MSG_CONFIRM to confirm the
neighbour. When used with MSG_PROBE we do not reach the
code where neighbour is confirmed, so we have to do the
same slow lookup by using the dst_confirm_neigh() helper.
When MSG_PROBE is not used, ip_append_data/ip6_append_data
will set the skb flag dst_pending_confirm.

Reported-by: YueHaibing <yuehaibing@huawei.com>
Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.")
Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 net/ipv4/ip_output.c  |  6 ++++++
 net/ipv4/ping.c       |  3 ++-
 net/ipv4/raw.c        |  6 +++++-
 net/ipv4/udp.c        |  3 ++-
 net/ipv6/ip6_output.c |  6 ++++++
 net/ipv6/raw.c        |  6 +++++-
 net/ipv6/route.c      | 27 ++++++++++++++-------------
 net/ipv6/udp.c        |  3 ++-
 net/l2tp/l2tp_ip6.c   |  3 ++-
 9 files changed, 44 insertions(+), 19 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index fbe63cc..3afab90 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -889,6 +889,9 @@ static inline int ip_ufo_append_data(struct sock *sk,
 
 		skb->csum = 0;
 
+		if (flags & MSG_CONFIRM)
+			skb->dst_pending_confirm = 1;
+
 		__skb_queue_tail(queue, skb);
 	} else if (skb_is_gso(skb)) {
 		goto append;
@@ -1089,6 +1092,9 @@ static int __ip_append_data(struct sock *sk,
 			exthdrlen = 0;
 			csummode = CHECKSUM_NONE;
 
+			if ((flags & MSG_CONFIRM) && !skb_prev)
+				skb->dst_pending_confirm = 1;
+
 			/*
 			 * Put the packet on the pending queue.
 			 */
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 86cca61..f3f6f50 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -848,7 +848,8 @@ static int ping_v4_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	return err;
 
 do_confirm:
-	dst_confirm(&rt->dst);
+	if (msg->msg_flags & MSG_PROBE)
+		dst_confirm_neigh(&rt->dst, &fl4.daddr);
 	if (!(msg->msg_flags & MSG_PROBE) || len)
 		goto back_from_confirm;
 	err = 0;
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 2300fae..47c5451 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -383,6 +383,9 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4,
 
 	sock_tx_timestamp(sk, sockc->tsflags, &skb_shinfo(skb)->tx_flags);
 
+	if (flags & MSG_CONFIRM)
+		skb->dst_pending_confirm = 1;
+
 	skb->transport_header = skb->network_header;
 	err = -EFAULT;
 	if (memcpy_from_msg(iph, msg, length))
@@ -666,7 +669,8 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	return len;
 
 do_confirm:
-	dst_confirm(&rt->dst);
+	if (msg->msg_flags & MSG_PROBE)
+		dst_confirm_neigh(&rt->dst, &fl4.daddr);
 	if (!(msg->msg_flags & MSG_PROBE) || len)
 		goto back_from_confirm;
 	err = 0;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 9ca279b..22a1eb1 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1113,7 +1113,8 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	return err;
 
 do_confirm:
-	dst_confirm(&rt->dst);
+	if (msg->msg_flags & MSG_PROBE)
+		dst_confirm_neigh(&rt->dst, &fl4->daddr);
 	if (!(msg->msg_flags&MSG_PROBE) || len)
 		goto back_from_confirm;
 	err = 0;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 285aa9f..efe8b3f 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1145,6 +1145,9 @@ static inline int ip6_ufo_append_data(struct sock *sk,
 		skb->protocol = htons(ETH_P_IPV6);
 		skb->csum = 0;
 
+		if (flags & MSG_CONFIRM)
+			skb->dst_pending_confirm = 1;
+
 		__skb_queue_tail(queue, skb);
 	} else if (skb_is_gso(skb)) {
 		goto append;
@@ -1517,6 +1520,9 @@ static int __ip6_append_data(struct sock *sk,
 			exthdrlen = 0;
 			dst_exthdrlen = 0;
 
+			if ((flags & MSG_CONFIRM) && !skb_prev)
+				skb->dst_pending_confirm = 1;
+
 			/*
 			 * Put the packet on the pending queue
 			 */
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 291ebc2..0597548 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -650,6 +650,9 @@ static int rawv6_send_hdrinc(struct sock *sk, struct msghdr *msg, int length,
 
 	skb->ip_summed = CHECKSUM_NONE;
 
+	if (flags & MSG_CONFIRM)
+		skb->dst_pending_confirm = 1;
+
 	skb->transport_header = skb->network_header;
 	err = memcpy_from_msg(iph, msg, length);
 	if (err)
@@ -930,7 +933,8 @@ static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	txopt_put(opt_to_free);
 	return err < 0 ? err : len;
 do_confirm:
-	dst_confirm(dst);
+	if (msg->msg_flags & MSG_PROBE)
+		dst_confirm_neigh(dst, &fl6.daddr);
 	if (!(msg->msg_flags & MSG_PROBE) || len)
 		goto back_from_confirm;
 	err = 0;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 4056d0e..c8e3730 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1375,6 +1375,7 @@ static bool rt6_cache_allowed_for_pmtu(const struct rt6_info *rt)
 static void __ip6_rt_update_pmtu(struct dst_entry *dst, const struct sock *sk,
 				 const struct ipv6hdr *iph, u32 mtu)
 {
+	const struct in6_addr *daddr, *saddr;
 	struct rt6_info *rt6 = (struct rt6_info *)dst;
 
 	if (rt6->rt6i_flags & RTF_LOCAL)
@@ -1383,26 +1384,26 @@ static void __ip6_rt_update_pmtu(struct dst_entry *dst, const struct sock *sk,
 	if (dst_metric_locked(dst, RTAX_MTU))
 		return;
 
-	dst_confirm(dst);
+	if (iph) {
+		daddr = &iph->daddr;
+		saddr = &iph->saddr;
+	} else if (sk) {
+		daddr = &sk->sk_v6_daddr;
+		saddr = &inet6_sk(sk)->saddr;
+	} else {
+		daddr = NULL;
+		saddr = NULL;
+	}
+	dst_confirm_neigh(dst, daddr);
 	mtu = max_t(u32, mtu, IPV6_MIN_MTU);
 	if (mtu >= dst_mtu(dst))
 		return;
 
 	if (!rt6_cache_allowed_for_pmtu(rt6)) {
 		rt6_do_update_pmtu(rt6, mtu);
-	} else {
-		const struct in6_addr *daddr, *saddr;
+	} else if (daddr) {
 		struct rt6_info *nrt6;
 
-		if (iph) {
-			daddr = &iph->daddr;
-			saddr = &iph->saddr;
-		} else if (sk) {
-			daddr = &sk->sk_v6_daddr;
-			saddr = &inet6_sk(sk)->saddr;
-		} else {
-			return;
-		}
 		nrt6 = ip6_rt_cache_alloc(rt6, daddr, saddr);
 		if (nrt6) {
 			rt6_do_update_pmtu(nrt6, mtu);
@@ -2274,7 +2275,7 @@ static void rt6_do_redirect(struct dst_entry *dst, struct sock *sk, struct sk_bu
 	 * Look, redirects are sent only in response to data packets,
 	 * so that this nexthop apparently is reachable. --ANK
 	 */
-	dst_confirm(&rt->dst);
+	dst_confirm_neigh(&rt->dst, &ipv6_hdr(skb)->saddr);
 
 	neigh = __neigh_lookup(&nd_tbl, &msg->target, skb->dev, 1);
 	if (!neigh)
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 649efc2..9956b7e 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1295,7 +1295,8 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	return err;
 
 do_confirm:
-	dst_confirm(dst);
+	if (msg->msg_flags & MSG_PROBE)
+		dst_confirm_neigh(dst, &fl6.daddr);
 	if (!(msg->msg_flags&MSG_PROBE) || len)
 		goto back_from_confirm;
 	err = 0;
diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
index f092ac4..060dbae 100644
--- a/net/l2tp/l2tp_ip6.c
+++ b/net/l2tp/l2tp_ip6.c
@@ -658,7 +658,8 @@ static int l2tp_ip6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	return err < 0 ? err : len;
 
 do_confirm:
-	dst_confirm(dst);
+	if (msg->msg_flags & MSG_PROBE)
+		dst_confirm_neigh(dst, &fl6.daddr);
 	if (!(msg->msg_flags & MSG_PROBE) || len)
 		goto back_from_confirm;
 	err = 0;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC net-next 6/7] net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP
@ 2016-12-18 20:56   ` Julian Anastasov
  0 siblings, 0 replies; 34+ messages in thread
From: Julian Anastasov @ 2016-12-18 20:56 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, YueHaibing

When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.

The datagram protocols can use MSG_CONFIRM to confirm the
neighbour. When used with MSG_PROBE we do not reach the
code where neighbour is confirmed, so we have to do the
same slow lookup by using the dst_confirm_neigh() helper.
When MSG_PROBE is not used, ip_append_data/ip6_append_data
will set the skb flag dst_pending_confirm.

Reported-by: YueHaibing <yuehaibing@huawei.com>
Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.")
Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 net/ipv4/ip_output.c  |  6 ++++++
 net/ipv4/ping.c       |  3 ++-
 net/ipv4/raw.c        |  6 +++++-
 net/ipv4/udp.c        |  3 ++-
 net/ipv6/ip6_output.c |  6 ++++++
 net/ipv6/raw.c        |  6 +++++-
 net/ipv6/route.c      | 27 ++++++++++++++-------------
 net/ipv6/udp.c        |  3 ++-
 net/l2tp/l2tp_ip6.c   |  3 ++-
 9 files changed, 44 insertions(+), 19 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index fbe63cc..3afab90 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -889,6 +889,9 @@ static inline int ip_ufo_append_data(struct sock *sk,
 
 		skb->csum = 0;
 
+		if (flags & MSG_CONFIRM)
+			skb->dst_pending_confirm = 1;
+
 		__skb_queue_tail(queue, skb);
 	} else if (skb_is_gso(skb)) {
 		goto append;
@@ -1089,6 +1092,9 @@ static int __ip_append_data(struct sock *sk,
 			exthdrlen = 0;
 			csummode = CHECKSUM_NONE;
 
+			if ((flags & MSG_CONFIRM) && !skb_prev)
+				skb->dst_pending_confirm = 1;
+
 			/*
 			 * Put the packet on the pending queue.
 			 */
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 86cca61..f3f6f50 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -848,7 +848,8 @@ static int ping_v4_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	return err;
 
 do_confirm:
-	dst_confirm(&rt->dst);
+	if (msg->msg_flags & MSG_PROBE)
+		dst_confirm_neigh(&rt->dst, &fl4.daddr);
 	if (!(msg->msg_flags & MSG_PROBE) || len)
 		goto back_from_confirm;
 	err = 0;
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 2300fae..47c5451 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -383,6 +383,9 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4,
 
 	sock_tx_timestamp(sk, sockc->tsflags, &skb_shinfo(skb)->tx_flags);
 
+	if (flags & MSG_CONFIRM)
+		skb->dst_pending_confirm = 1;
+
 	skb->transport_header = skb->network_header;
 	err = -EFAULT;
 	if (memcpy_from_msg(iph, msg, length))
@@ -666,7 +669,8 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	return len;
 
 do_confirm:
-	dst_confirm(&rt->dst);
+	if (msg->msg_flags & MSG_PROBE)
+		dst_confirm_neigh(&rt->dst, &fl4.daddr);
 	if (!(msg->msg_flags & MSG_PROBE) || len)
 		goto back_from_confirm;
 	err = 0;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 9ca279b..22a1eb1 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1113,7 +1113,8 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	return err;
 
 do_confirm:
-	dst_confirm(&rt->dst);
+	if (msg->msg_flags & MSG_PROBE)
+		dst_confirm_neigh(&rt->dst, &fl4->daddr);
 	if (!(msg->msg_flags&MSG_PROBE) || len)
 		goto back_from_confirm;
 	err = 0;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 285aa9f..efe8b3f 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1145,6 +1145,9 @@ static inline int ip6_ufo_append_data(struct sock *sk,
 		skb->protocol = htons(ETH_P_IPV6);
 		skb->csum = 0;
 
+		if (flags & MSG_CONFIRM)
+			skb->dst_pending_confirm = 1;
+
 		__skb_queue_tail(queue, skb);
 	} else if (skb_is_gso(skb)) {
 		goto append;
@@ -1517,6 +1520,9 @@ static int __ip6_append_data(struct sock *sk,
 			exthdrlen = 0;
 			dst_exthdrlen = 0;
 
+			if ((flags & MSG_CONFIRM) && !skb_prev)
+				skb->dst_pending_confirm = 1;
+
 			/*
 			 * Put the packet on the pending queue
 			 */
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 291ebc2..0597548 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -650,6 +650,9 @@ static int rawv6_send_hdrinc(struct sock *sk, struct msghdr *msg, int length,
 
 	skb->ip_summed = CHECKSUM_NONE;
 
+	if (flags & MSG_CONFIRM)
+		skb->dst_pending_confirm = 1;
+
 	skb->transport_header = skb->network_header;
 	err = memcpy_from_msg(iph, msg, length);
 	if (err)
@@ -930,7 +933,8 @@ static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	txopt_put(opt_to_free);
 	return err < 0 ? err : len;
 do_confirm:
-	dst_confirm(dst);
+	if (msg->msg_flags & MSG_PROBE)
+		dst_confirm_neigh(dst, &fl6.daddr);
 	if (!(msg->msg_flags & MSG_PROBE) || len)
 		goto back_from_confirm;
 	err = 0;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 4056d0e..c8e3730 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1375,6 +1375,7 @@ static bool rt6_cache_allowed_for_pmtu(const struct rt6_info *rt)
 static void __ip6_rt_update_pmtu(struct dst_entry *dst, const struct sock *sk,
 				 const struct ipv6hdr *iph, u32 mtu)
 {
+	const struct in6_addr *daddr, *saddr;
 	struct rt6_info *rt6 = (struct rt6_info *)dst;
 
 	if (rt6->rt6i_flags & RTF_LOCAL)
@@ -1383,26 +1384,26 @@ static void __ip6_rt_update_pmtu(struct dst_entry *dst, const struct sock *sk,
 	if (dst_metric_locked(dst, RTAX_MTU))
 		return;
 
-	dst_confirm(dst);
+	if (iph) {
+		daddr = &iph->daddr;
+		saddr = &iph->saddr;
+	} else if (sk) {
+		daddr = &sk->sk_v6_daddr;
+		saddr = &inet6_sk(sk)->saddr;
+	} else {
+		daddr = NULL;
+		saddr = NULL;
+	}
+	dst_confirm_neigh(dst, daddr);
 	mtu = max_t(u32, mtu, IPV6_MIN_MTU);
 	if (mtu >= dst_mtu(dst))
 		return;
 
 	if (!rt6_cache_allowed_for_pmtu(rt6)) {
 		rt6_do_update_pmtu(rt6, mtu);
-	} else {
-		const struct in6_addr *daddr, *saddr;
+	} else if (daddr) {
 		struct rt6_info *nrt6;
 
-		if (iph) {
-			daddr = &iph->daddr;
-			saddr = &iph->saddr;
-		} else if (sk) {
-			daddr = &sk->sk_v6_daddr;
-			saddr = &inet6_sk(sk)->saddr;
-		} else {
-			return;
-		}
 		nrt6 = ip6_rt_cache_alloc(rt6, daddr, saddr);
 		if (nrt6) {
 			rt6_do_update_pmtu(nrt6, mtu);
@@ -2274,7 +2275,7 @@ static void rt6_do_redirect(struct dst_entry *dst, struct sock *sk, struct sk_bu
 	 * Look, redirects are sent only in response to data packets,
 	 * so that this nexthop apparently is reachable. --ANK
 	 */
-	dst_confirm(&rt->dst);
+	dst_confirm_neigh(&rt->dst, &ipv6_hdr(skb)->saddr);
 
 	neigh = __neigh_lookup(&nd_tbl, &msg->target, skb->dev, 1);
 	if (!neigh)
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 649efc2..9956b7e 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1295,7 +1295,8 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	return err;
 
 do_confirm:
-	dst_confirm(dst);
+	if (msg->msg_flags & MSG_PROBE)
+		dst_confirm_neigh(dst, &fl6.daddr);
 	if (!(msg->msg_flags&MSG_PROBE) || len)
 		goto back_from_confirm;
 	err = 0;
diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
index f092ac4..060dbae 100644
--- a/net/l2tp/l2tp_ip6.c
+++ b/net/l2tp/l2tp_ip6.c
@@ -658,7 +658,8 @@ static int l2tp_ip6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	return err < 0 ? err : len;
 
 do_confirm:
-	dst_confirm(dst);
+	if (msg->msg_flags & MSG_PROBE)
+		dst_confirm_neigh(dst, &fl6.daddr);
 	if (!(msg->msg_flags & MSG_PROBE) || len)
 		goto back_from_confirm;
 	err = 0;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC net-next 7/7] net: pending_confirm is not used anymore
  2016-12-18 20:56 ` Julian Anastasov
@ 2016-12-18 20:56   ` Julian Anastasov
  -1 siblings, 0 replies; 34+ messages in thread
From: Julian Anastasov @ 2016-12-18 20:56 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, YueHaibing

When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.
As last step, we can remove the pending_confirm flag.

Reported-by: YueHaibing <yuehaibing@huawei.com>
Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.")
Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 include/net/dst.h | 14 ++------------
 net/core/dst.c    |  1 -
 2 files changed, 2 insertions(+), 13 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 3a3b34b..84a1043 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -59,8 +59,6 @@ struct dst_entry {
 #define DST_XFRM_QUEUE		0x0100
 #define DST_METADATA		0x0200
 
-	unsigned short		pending_confirm;
-
 	short			error;
 
 	/* A non-zero value of dst->obsolete forces by-hand validation
@@ -78,6 +76,8 @@ struct dst_entry {
 #define DST_OBSOLETE_KILL	-2
 	unsigned short		header_len;	/* more space at head required */
 	unsigned short		trailer_len;	/* space to reserve at tail */
+	unsigned short		__pad3;
+
 #ifdef CONFIG_IP_ROUTE_CLASSID
 	__u32			tclassid;
 #else
@@ -440,7 +440,6 @@ static inline void dst_rcu_free(struct rcu_head *head)
 
 static inline void dst_confirm(struct dst_entry *dst)
 {
-	dst->pending_confirm = 1;
 }
 
 static inline int dst_neigh_output(struct dst_entry *dst, struct neighbour *n,
@@ -448,15 +447,6 @@ static inline int dst_neigh_output(struct dst_entry *dst, struct neighbour *n,
 {
 	const struct hh_cache *hh;
 
-	if (dst->pending_confirm) {
-		unsigned long now = jiffies;
-
-		dst->pending_confirm = 0;
-		/* avoid dirtying neighbour */
-		if (n->confirmed != now)
-			n->confirmed = now;
-	}
-
 	hh = &n->hh;
 	if ((n->nud_state & NUD_CONNECTED) && hh->hh_len)
 		return neigh_hh_output(hh, skb);
diff --git a/net/core/dst.c b/net/core/dst.c
index b5cbbe0..960e503 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -190,7 +190,6 @@ void dst_init(struct dst_entry *dst, struct dst_ops *ops,
 	dst->__use = 0;
 	dst->lastuse = jiffies;
 	dst->flags = flags;
-	dst->pending_confirm = 0;
 	dst->next = NULL;
 	if (!(flags & DST_NOCOUNT))
 		dst_entries_add(ops, 1);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH RFC net-next 7/7] net: pending_confirm is not used anymore
@ 2016-12-18 20:56   ` Julian Anastasov
  0 siblings, 0 replies; 34+ messages in thread
From: Julian Anastasov @ 2016-12-18 20:56 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, YueHaibing

When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.
As last step, we can remove the pending_confirm flag.

Reported-by: YueHaibing <yuehaibing@huawei.com>
Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.")
Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 include/net/dst.h | 14 ++------------
 net/core/dst.c    |  1 -
 2 files changed, 2 insertions(+), 13 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 3a3b34b..84a1043 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -59,8 +59,6 @@ struct dst_entry {
 #define DST_XFRM_QUEUE		0x0100
 #define DST_METADATA		0x0200
 
-	unsigned short		pending_confirm;
-
 	short			error;
 
 	/* A non-zero value of dst->obsolete forces by-hand validation
@@ -78,6 +76,8 @@ struct dst_entry {
 #define DST_OBSOLETE_KILL	-2
 	unsigned short		header_len;	/* more space at head required */
 	unsigned short		trailer_len;	/* space to reserve at tail */
+	unsigned short		__pad3;
+
 #ifdef CONFIG_IP_ROUTE_CLASSID
 	__u32			tclassid;
 #else
@@ -440,7 +440,6 @@ static inline void dst_rcu_free(struct rcu_head *head)
 
 static inline void dst_confirm(struct dst_entry *dst)
 {
-	dst->pending_confirm = 1;
 }
 
 static inline int dst_neigh_output(struct dst_entry *dst, struct neighbour *n,
@@ -448,15 +447,6 @@ static inline int dst_neigh_output(struct dst_entry *dst, struct neighbour *n,
 {
 	const struct hh_cache *hh;
 
-	if (dst->pending_confirm) {
-		unsigned long now = jiffies;
-
-		dst->pending_confirm = 0;
-		/* avoid dirtying neighbour */
-		if (n->confirmed != now)
-			n->confirmed = now;
-	}
-
 	hh = &n->hh;
 	if ((n->nud_state & NUD_CONNECTED) && hh->hh_len)
 		return neigh_hh_output(hh, skb);
diff --git a/net/core/dst.c b/net/core/dst.c
index b5cbbe0..960e503 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -190,7 +190,6 @@ void dst_init(struct dst_entry *dst, struct dst_ops *ops,
 	dst->__use = 0;
 	dst->lastuse = jiffies;
 	dst->flags = flags;
-	dst->pending_confirm = 0;
 	dst->next = NULL;
 	if (!(flags & DST_NOCOUNT))
 		dst_entries_add(ops, 1);
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC net-next 2/7] net: add dst_pending_confirm flag to skbuff
  2016-12-18 20:56   ` Julian Anastasov
@ 2016-12-19 16:17     ` Eric Dumazet
  -1 siblings, 0 replies; 34+ messages in thread
From: Eric Dumazet @ 2016-12-19 16:17 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: netdev, linux-sctp, YueHaibing

On Sun, 2016-12-18 at 22:56 +0200, Julian Anastasov wrote:

>  
> +static inline void sock_confirm_neigh(struct sk_buff *skb, struct neighbour *n)
> +{
> +	if (unlikely(skb->dst_pending_confirm)) {
> +		struct sock *sk = skb->sk;
> +		unsigned long now = jiffies;
> +
> +		/* avoid dirtying neighbour */
> +		if (n->confirmed != now)
> +			n->confirmed = now;
> +		if (sk && sk->sk_dst_pending_confirm)
> +			sk->sk_dst_pending_confirm = 0;
> +	}
> +}
> +

I am still digesting this awesome patch series ;)

Not sure why you used an unlikely() here. TCP for example would hit this
path quite often.

So considering sk_dst_pending_confirm might be dirtied quite often,

I am not sure why you placed it in the cache line that contains
sk_rx_dst (in 1st patch)

Thanks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC net-next 2/7] net: add dst_pending_confirm flag to skbuff
@ 2016-12-19 16:17     ` Eric Dumazet
  0 siblings, 0 replies; 34+ messages in thread
From: Eric Dumazet @ 2016-12-19 16:17 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: netdev, linux-sctp, YueHaibing

On Sun, 2016-12-18 at 22:56 +0200, Julian Anastasov wrote:

>  
> +static inline void sock_confirm_neigh(struct sk_buff *skb, struct neighbour *n)
> +{
> +	if (unlikely(skb->dst_pending_confirm)) {
> +		struct sock *sk = skb->sk;
> +		unsigned long now = jiffies;
> +
> +		/* avoid dirtying neighbour */
> +		if (n->confirmed != now)
> +			n->confirmed = now;
> +		if (sk && sk->sk_dst_pending_confirm)
> +			sk->sk_dst_pending_confirm = 0;
> +	}
> +}
> +

I am still digesting this awesome patch series ;)

Not sure why you used an unlikely() here. TCP for example would hit this
path quite often.

So considering sk_dst_pending_confirm might be dirtied quite often,

I am not sure why you placed it in the cache line that contains
sk_rx_dst (in 1st patch)

Thanks.




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC net-next 2/7] net: add dst_pending_confirm flag to skbuff
  2016-12-19 16:17     ` Eric Dumazet
@ 2016-12-19 16:36       ` Hannes Frederic Sowa
  -1 siblings, 0 replies; 34+ messages in thread
From: Hannes Frederic Sowa @ 2016-12-19 16:36 UTC (permalink / raw)
  To: Eric Dumazet, Julian Anastasov; +Cc: netdev, linux-sctp, YueHaibing

On 19.12.2016 17:17, Eric Dumazet wrote:
> On Sun, 2016-12-18 at 22:56 +0200, Julian Anastasov wrote:
> 
>>  
>> +static inline void sock_confirm_neigh(struct sk_buff *skb, struct neighbour *n)
>> +{
>> +	if (unlikely(skb->dst_pending_confirm)) {
>> +		struct sock *sk = skb->sk;
>> +		unsigned long now = jiffies;
>> +
>> +		/* avoid dirtying neighbour */
>> +		if (n->confirmed != now)
>> +			n->confirmed = now;
>> +		if (sk && sk->sk_dst_pending_confirm)
>> +			sk->sk_dst_pending_confirm = 0;
>> +	}
>> +}
>> +
> 
> I am still digesting this awesome patch series ;)
> 
> Not sure why you used an unlikely() here. TCP for example would hit this
> path quite often.
> 
> So considering sk_dst_pending_confirm might be dirtied quite often,
> 
> I am not sure why you placed it in the cache line that contains
> sk_rx_dst (in 1st patch)

Because they have to stay synchronized?

If we modify sk_rx_dst, we automatically also must clear
pending_confirm, otherwise we might end up confirming a wrong neighbor.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC net-next 2/7] net: add dst_pending_confirm flag to skbuff
@ 2016-12-19 16:36       ` Hannes Frederic Sowa
  0 siblings, 0 replies; 34+ messages in thread
From: Hannes Frederic Sowa @ 2016-12-19 16:36 UTC (permalink / raw)
  To: Eric Dumazet, Julian Anastasov; +Cc: netdev, linux-sctp, YueHaibing

On 19.12.2016 17:17, Eric Dumazet wrote:
> On Sun, 2016-12-18 at 22:56 +0200, Julian Anastasov wrote:
> 
>>  
>> +static inline void sock_confirm_neigh(struct sk_buff *skb, struct neighbour *n)
>> +{
>> +	if (unlikely(skb->dst_pending_confirm)) {
>> +		struct sock *sk = skb->sk;
>> +		unsigned long now = jiffies;
>> +
>> +		/* avoid dirtying neighbour */
>> +		if (n->confirmed != now)
>> +			n->confirmed = now;
>> +		if (sk && sk->sk_dst_pending_confirm)
>> +			sk->sk_dst_pending_confirm = 0;
>> +	}
>> +}
>> +
> 
> I am still digesting this awesome patch series ;)
> 
> Not sure why you used an unlikely() here. TCP for example would hit this
> path quite often.
> 
> So considering sk_dst_pending_confirm might be dirtied quite often,
> 
> I am not sure why you placed it in the cache line that contains
> sk_rx_dst (in 1st patch)

Because they have to stay synchronized?

If we modify sk_rx_dst, we automatically also must clear
pending_confirm, otherwise we might end up confirming a wrong neighbor.

Bye,
Hannes


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC net-next 2/7] net: add dst_pending_confirm flag to skbuff
  2016-12-19 16:36       ` Hannes Frederic Sowa
@ 2016-12-19 16:40         ` Eric Dumazet
  -1 siblings, 0 replies; 34+ messages in thread
From: Eric Dumazet @ 2016-12-19 16:40 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: Julian Anastasov, netdev, linux-sctp, YueHaibing

On Mon, 2016-12-19 at 17:36 +0100, Hannes Frederic Sowa wrote:
> On 19.12.2016 17:17, Eric Dumazet wrote:
> > On Sun, 2016-12-18 at 22:56 +0200, Julian Anastasov wrote:
> > 
> >>  
> >> +static inline void sock_confirm_neigh(struct sk_buff *skb, struct neighbour *n)
> >> +{
> >> +	if (unlikely(skb->dst_pending_confirm)) {
> >> +		struct sock *sk = skb->sk;
> >> +		unsigned long now = jiffies;
> >> +
> >> +		/* avoid dirtying neighbour */
> >> +		if (n->confirmed != now)
> >> +			n->confirmed = now;
> >> +		if (sk && sk->sk_dst_pending_confirm)
> >> +			sk->sk_dst_pending_confirm = 0;
> >> +	}
> >> +}
> >> +
> > 
> > I am still digesting this awesome patch series ;)
> > 
> > Not sure why you used an unlikely() here. TCP for example would hit this
> > path quite often.
> > 
> > So considering sk_dst_pending_confirm might be dirtied quite often,
> > 
> > I am not sure why you placed it in the cache line that contains
> > sk_rx_dst (in 1st patch)
> 
> Because they have to stay synchronized?
> 
> If we modify sk_rx_dst, we automatically also must clear
> pending_confirm, otherwise we might end up confirming a wrong neighbor.

Your answer makes little sense really...

For most TCP flows, we set sk_rx_dst exactly once.

Hardly a good reason to have these in the same cache line.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC net-next 2/7] net: add dst_pending_confirm flag to skbuff
@ 2016-12-19 16:40         ` Eric Dumazet
  0 siblings, 0 replies; 34+ messages in thread
From: Eric Dumazet @ 2016-12-19 16:40 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: Julian Anastasov, netdev, linux-sctp, YueHaibing

On Mon, 2016-12-19 at 17:36 +0100, Hannes Frederic Sowa wrote:
> On 19.12.2016 17:17, Eric Dumazet wrote:
> > On Sun, 2016-12-18 at 22:56 +0200, Julian Anastasov wrote:
> > 
> >>  
> >> +static inline void sock_confirm_neigh(struct sk_buff *skb, struct neighbour *n)
> >> +{
> >> +	if (unlikely(skb->dst_pending_confirm)) {
> >> +		struct sock *sk = skb->sk;
> >> +		unsigned long now = jiffies;
> >> +
> >> +		/* avoid dirtying neighbour */
> >> +		if (n->confirmed != now)
> >> +			n->confirmed = now;
> >> +		if (sk && sk->sk_dst_pending_confirm)
> >> +			sk->sk_dst_pending_confirm = 0;
> >> +	}
> >> +}
> >> +
> > 
> > I am still digesting this awesome patch series ;)
> > 
> > Not sure why you used an unlikely() here. TCP for example would hit this
> > path quite often.
> > 
> > So considering sk_dst_pending_confirm might be dirtied quite often,
> > 
> > I am not sure why you placed it in the cache line that contains
> > sk_rx_dst (in 1st patch)
> 
> Because they have to stay synchronized?
> 
> If we modify sk_rx_dst, we automatically also must clear
> pending_confirm, otherwise we might end up confirming a wrong neighbor.

Your answer makes little sense really...

For most TCP flows, we set sk_rx_dst exactly once.

Hardly a good reason to have these in the same cache line.





^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC net-next 2/7] net: add dst_pending_confirm flag to skbuff
  2016-12-19 16:40         ` Eric Dumazet
@ 2016-12-19 16:53           ` Hannes Frederic Sowa
  -1 siblings, 0 replies; 34+ messages in thread
From: Hannes Frederic Sowa @ 2016-12-19 16:53 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Julian Anastasov, netdev, linux-sctp, YueHaibing

On 19.12.2016 17:40, Eric Dumazet wrote:
> On Mon, 2016-12-19 at 17:36 +0100, Hannes Frederic Sowa wrote:
>> On 19.12.2016 17:17, Eric Dumazet wrote:
>>> On Sun, 2016-12-18 at 22:56 +0200, Julian Anastasov wrote:
>>>
>>>>  
>>>> +static inline void sock_confirm_neigh(struct sk_buff *skb, struct neighbour *n)
>>>> +{
>>>> +	if (unlikely(skb->dst_pending_confirm)) {
>>>> +		struct sock *sk = skb->sk;
>>>> +		unsigned long now = jiffies;
>>>> +
>>>> +		/* avoid dirtying neighbour */
>>>> +		if (n->confirmed != now)
>>>> +			n->confirmed = now;
>>>> +		if (sk && sk->sk_dst_pending_confirm)
>>>> +			sk->sk_dst_pending_confirm = 0;
>>>> +	}
>>>> +}
>>>> +
>>>
>>> I am still digesting this awesome patch series ;)
>>>
>>> Not sure why you used an unlikely() here. TCP for example would hit this
>>> path quite often.
>>>
>>> So considering sk_dst_pending_confirm might be dirtied quite often,
>>>
>>> I am not sure why you placed it in the cache line that contains
>>> sk_rx_dst (in 1st patch)
>>
>> Because they have to stay synchronized?
>>
>> If we modify sk_rx_dst, we automatically also must clear
>> pending_confirm, otherwise we might end up confirming a wrong neighbor.
> 
> Your answer makes little sense really...
> 
> For most TCP flows, we set sk_rx_dst exactly once.
> 
> Hardly a good reason to have these in the same cache line.

Right :) , and I didn't actually wanted to make an argument in favor of
that. Just noted they are probably semantically grouped together as an
explanation.

Sorry,
Hannes

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC net-next 2/7] net: add dst_pending_confirm flag to skbuff
@ 2016-12-19 16:53           ` Hannes Frederic Sowa
  0 siblings, 0 replies; 34+ messages in thread
From: Hannes Frederic Sowa @ 2016-12-19 16:53 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Julian Anastasov, netdev, linux-sctp, YueHaibing

On 19.12.2016 17:40, Eric Dumazet wrote:
> On Mon, 2016-12-19 at 17:36 +0100, Hannes Frederic Sowa wrote:
>> On 19.12.2016 17:17, Eric Dumazet wrote:
>>> On Sun, 2016-12-18 at 22:56 +0200, Julian Anastasov wrote:
>>>
>>>>  
>>>> +static inline void sock_confirm_neigh(struct sk_buff *skb, struct neighbour *n)
>>>> +{
>>>> +	if (unlikely(skb->dst_pending_confirm)) {
>>>> +		struct sock *sk = skb->sk;
>>>> +		unsigned long now = jiffies;
>>>> +
>>>> +		/* avoid dirtying neighbour */
>>>> +		if (n->confirmed != now)
>>>> +			n->confirmed = now;
>>>> +		if (sk && sk->sk_dst_pending_confirm)
>>>> +			sk->sk_dst_pending_confirm = 0;
>>>> +	}
>>>> +}
>>>> +
>>>
>>> I am still digesting this awesome patch series ;)
>>>
>>> Not sure why you used an unlikely() here. TCP for example would hit this
>>> path quite often.
>>>
>>> So considering sk_dst_pending_confirm might be dirtied quite often,
>>>
>>> I am not sure why you placed it in the cache line that contains
>>> sk_rx_dst (in 1st patch)
>>
>> Because they have to stay synchronized?
>>
>> If we modify sk_rx_dst, we automatically also must clear
>> pending_confirm, otherwise we might end up confirming a wrong neighbor.
> 
> Your answer makes little sense really...
> 
> For most TCP flows, we set sk_rx_dst exactly once.
> 
> Hardly a good reason to have these in the same cache line.

Right :) , and I didn't actually wanted to make an argument in favor of
that. Just noted they are probably semantically grouped together as an
explanation.

Sorry,
Hannes


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC net-next 2/7] net: add dst_pending_confirm flag to skbuff
  2016-12-19 16:17     ` Eric Dumazet
@ 2016-12-19 20:37       ` Julian Anastasov
  -1 siblings, 0 replies; 34+ messages in thread
From: Julian Anastasov @ 2016-12-19 20:37 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-sctp, YueHaibing


	Hello,

On Mon, 19 Dec 2016, Eric Dumazet wrote:

> I am still digesting this awesome patch series ;)

	Thanks. I don't feel quite comfortable with some
of the changes (mostly XFRM, dst_confirm usage in CXGB) and
I hope the discussion can provide adequate solution.

> Not sure why you used an unlikely() here. TCP for example would hit this
> path quite often.

	I was not sure, may be because ACKs can come with lower
rate than the sent packets. Also because non-TCP rarely uses
MSG_CONFIRM. If you still think it is better without unlikely,
I'll remove it.

> So considering sk_dst_pending_confirm might be dirtied quite often,
> 
> I am not sure why you placed it in the cache line that contains
> sk_rx_dst (in 1st patch)

	I saw your recent changes and was worried if the
sk_dst_confirm() calling on RX can cause unwanted dirtying of
additional cache line. My preliminary analyze pointed
sk_omem_alloc as good candidate for moving to next cache
line. I know how critical is to properly place the new flags,
so I really need recommendations about this.

Regards

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC net-next 2/7] net: add dst_pending_confirm flag to skbuff
@ 2016-12-19 20:37       ` Julian Anastasov
  0 siblings, 0 replies; 34+ messages in thread
From: Julian Anastasov @ 2016-12-19 20:37 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-sctp, YueHaibing


	Hello,

On Mon, 19 Dec 2016, Eric Dumazet wrote:

> I am still digesting this awesome patch series ;)

	Thanks. I don't feel quite comfortable with some
of the changes (mostly XFRM, dst_confirm usage in CXGB) and
I hope the discussion can provide adequate solution.

> Not sure why you used an unlikely() here. TCP for example would hit this
> path quite often.

	I was not sure, may be because ACKs can come with lower
rate than the sent packets. Also because non-TCP rarely uses
MSG_CONFIRM. If you still think it is better without unlikely,
I'll remove it.

> So considering sk_dst_pending_confirm might be dirtied quite often,
> 
> I am not sure why you placed it in the cache line that contains
> sk_rx_dst (in 1st patch)

	I saw your recent changes and was worried if the
sk_dst_confirm() calling on RX can cause unwanted dirtying of
additional cache line. My preliminary analyze pointed
sk_omem_alloc as good candidate for moving to next cache
line. I know how critical is to properly place the new flags,
so I really need recommendations about this.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC net-next 3/7] sctp: add dst_pending_confirm flag
  2016-12-18 20:56   ` Julian Anastasov
@ 2016-12-21 14:11     ` Neil Horman
  -1 siblings, 0 replies; 34+ messages in thread
From: Neil Horman @ 2016-12-21 14:11 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: netdev, linux-sctp, YueHaibing

On Sun, Dec 18, 2016 at 10:56:32PM +0200, Julian Anastasov wrote:
> Add new transport flag to allow sockets to confirm neighbour.
> When same struct dst_entry can be used for many different
> neighbours we can not use it for pending confirmations.
> The flag is propagated from transport to every packet.
> It is reset when cached dst is reset.
> 
> Reported-by: YueHaibing <yuehaibing@huawei.com>
> Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.")
> Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.")
> Signed-off-by: Julian Anastasov <ja@ssi.bg>
> ---
>  include/net/sctp/sctp.h    |  6 ++----
>  include/net/sctp/structs.h |  4 ++++
>  net/sctp/associola.c       |  3 +--
>  net/sctp/output.c          | 10 +++++++++-
>  net/sctp/outqueue.c        |  2 +-
>  net/sctp/sm_make_chunk.c   |  6 ++----
>  net/sctp/sm_sideeffect.c   |  2 +-
>  net/sctp/socket.c          |  4 ++--
>  net/sctp/transport.c       | 18 +++++++++++++++++-
>  9 files changed, 39 insertions(+), 16 deletions(-)
> 
> diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
> index f0dcaeb..85d9083 100644
> --- a/include/net/sctp/sctp.h
> +++ b/include/net/sctp/sctp.h
> @@ -586,10 +586,8 @@ static inline void sctp_v4_map_v6(union sctp_addr *addr)
>   */
>  static inline struct dst_entry *sctp_transport_dst_check(struct sctp_transport *t)
>  {
> -	if (t->dst && !dst_check(t->dst, t->dst_cookie)) {
> -		dst_release(t->dst);
> -		t->dst = NULL;
> -	}
> +	if (t->dst && !dst_check(t->dst, t->dst_cookie))
> +		sctp_transport_dst_release(t);
>  
>  	return t->dst;
>  }
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index 92daabd..e842e84 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -838,6 +838,8 @@ struct sctp_transport {
>  
>  	__u32 burst_limited;	/* Holds old cwnd when max.burst is applied */
>  
> +	__u32 dst_pending_confirm;	/* need to confirm neighbour */
> +
>  	/* Destination */
>  	struct dst_entry *dst;
>  	/* Source address. */
> @@ -980,6 +982,8 @@ void sctp_transport_route(struct sctp_transport *, union sctp_addr *,
>  void sctp_transport_reset(struct sctp_transport *);
>  void sctp_transport_update_pmtu(struct sock *, struct sctp_transport *, u32);
>  void sctp_transport_immediate_rtx(struct sctp_transport *);
> +void sctp_transport_dst_release(struct sctp_transport *t);
> +void sctp_transport_dst_confirm(struct sctp_transport *t);
>  
>  
>  /* This is the structure we use to queue packets as they come into
> diff --git a/net/sctp/associola.c b/net/sctp/associola.c
> index 68428e1..7bd26e0 100644
> --- a/net/sctp/associola.c
> +++ b/net/sctp/associola.c
> @@ -820,8 +820,7 @@ void sctp_assoc_control_transport(struct sctp_association *asoc,
>  		if (transport->state != SCTP_UNCONFIRMED)
>  			transport->state = SCTP_INACTIVE;
>  		else {
> -			dst_release(transport->dst);
> -			transport->dst = NULL;
> +			sctp_transport_dst_release(transport);
>  			ulp_notify = false;
>  		}
>  
> diff --git a/net/sctp/output.c b/net/sctp/output.c
> index f5320a8..4684a00 100644
> --- a/net/sctp/output.c
> +++ b/net/sctp/output.c
> @@ -550,6 +550,7 @@ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t gfp)
>  	struct sctp_association *asoc = tp->asoc;
>  	struct sctp_chunk *chunk, *tmp;
>  	int pkt_count, gso = 0;
> +	int confirm;
>  	struct dst_entry *dst;
>  	struct sk_buff *head;
>  	struct sctphdr *sh;
> @@ -628,7 +629,14 @@ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t gfp)
>  			asoc->peer.last_sent_to = tp;
>  	}
>  	head->ignore_df = packet->ipfragok;
> -	tp->af_specific->sctp_xmit(head, tp);
> +	confirm = tp->dst_pending_confirm;
> +	if (confirm)
> +		head->dst_pending_confirm = 1;
Why not use your confirm helper function here?

> +	/* neighbour should be confirmed on successful transmission or
> +	 * positive error
> +	 */
> +	if (tp->af_specific->sctp_xmit(head, tp) >= 0 && confirm)
> +		tp->dst_pending_confirm = 0;
>  
>  out:
>  	list_for_each_entry_safe(chunk, tmp, &packet->chunk_list, list) {
> diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
> index e540826..8234799 100644
> --- a/net/sctp/outqueue.c
> +++ b/net/sctp/outqueue.c
> @@ -1641,7 +1641,7 @@ static void sctp_check_transmitted(struct sctp_outq *q,
>  
>  		if (forward_progress) {
>  			if (transport->dst)
> -				dst_confirm(transport->dst);
> +				sctp_transport_dst_confirm(transport);
>  		}
>  	}
>  
> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> index 9e9690b..6fb15bf 100644
> --- a/net/sctp/sm_make_chunk.c
> +++ b/net/sctp/sm_make_chunk.c
> @@ -3317,8 +3317,7 @@ static void sctp_asconf_param_success(struct sctp_association *asoc,
>  		local_bh_enable();
>  		list_for_each_entry(transport, &asoc->peer.transport_addr_list,
>  				transports) {
> -			dst_release(transport->dst);
> -			transport->dst = NULL;
> +			sctp_transport_dst_release(transport);
>  		}
>  		break;
>  	case SCTP_PARAM_DEL_IP:
> @@ -3332,8 +3331,7 @@ static void sctp_asconf_param_success(struct sctp_association *asoc,
>  		local_bh_enable();
>  		list_for_each_entry(transport, &asoc->peer.transport_addr_list,
>  				transports) {
> -			dst_release(transport->dst);
> -			transport->dst = NULL;
> +			sctp_transport_dst_release(transport);
>  		}
>  		break;
>  	default:
> diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
> index c345bf1..9255b29 100644
> --- a/net/sctp/sm_sideeffect.c
> +++ b/net/sctp/sm_sideeffect.c
> @@ -723,7 +723,7 @@ static void sctp_cmd_transport_on(sctp_cmd_seq_t *cmds,
>  	 * forward progress.
>  	 */
>  	if (t->dst)
> -		dst_confirm(t->dst);
> +		sctp_transport_dst_confirm(t);
>  
>  	/* The receiver of the HEARTBEAT ACK should also perform an
>  	 * RTT measurement for that destination transport address
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 318c678..9ee676a 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -588,7 +588,7 @@ static int sctp_send_asconf_add_ip(struct sock		*sk,
>  			list_for_each_entry(trans,
>  			    &asoc->peer.transport_addr_list, transports) {
>  				/* Clear the source and route cache */
> -				dst_release(trans->dst);
> +				sctp_transport_dst_release(trans);
>  				trans->cwnd = min(4*asoc->pathmtu, max_t(__u32,
>  				    2*asoc->pathmtu, 4380));
>  				trans->ssthresh = asoc->peer.i.a_rwnd;
> @@ -839,7 +839,7 @@ static int sctp_send_asconf_del_ip(struct sock		*sk,
>  		 */
>  		list_for_each_entry(transport, &asoc->peer.transport_addr_list,
>  					transports) {
> -			dst_release(transport->dst);
> +			sctp_transport_dst_release(transport);
>  			sctp_transport_route(transport, NULL,
>  					     sctp_sk(asoc->base.sk));
>  		}
> diff --git a/net/sctp/transport.c b/net/sctp/transport.c
> index ce54dce..db66514 100644
> --- a/net/sctp/transport.c
> +++ b/net/sctp/transport.c
> @@ -227,7 +227,7 @@ void sctp_transport_pmtu(struct sctp_transport *transport, struct sock *sk)
>  {
>  	/* If we don't have a fresh route, look one up */
>  	if (!transport->dst || transport->dst->obsolete) {
> -		dst_release(transport->dst);
> +		sctp_transport_dst_release(transport);
>  		transport->af_specific->get_dst(transport, &transport->saddr,
>  						&transport->fl, sk);
>  	}
> @@ -659,3 +659,19 @@ void sctp_transport_immediate_rtx(struct sctp_transport *t)
>  			sctp_transport_hold(t);
>  	}
>  }
> +
> +/* Drop dst */
> +void sctp_transport_dst_release(struct sctp_transport *t)
> +{
> +	dst_release(t->dst);
> +	t->dst = NULL;
> +	t->dst_pending_confirm = 0;
> +}
> +
> +/* Schedule neighbour confirm */
> +void sctp_transport_dst_confirm(struct sctp_transport *t)
> +{
> +	if (!t->dst_pending_confirm)
> +		t->dst_pending_confirm = 1;
I'm not sure why you check it for being zero before setting it here, just set it
unilaterally instead, or are you trying to avoid dirtying a cacheline?

Neil

> +}
> +
> -- 
> 1.9.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC net-next 3/7] sctp: add dst_pending_confirm flag
@ 2016-12-21 14:11     ` Neil Horman
  0 siblings, 0 replies; 34+ messages in thread
From: Neil Horman @ 2016-12-21 14:11 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: netdev, linux-sctp, YueHaibing

On Sun, Dec 18, 2016 at 10:56:32PM +0200, Julian Anastasov wrote:
> Add new transport flag to allow sockets to confirm neighbour.
> When same struct dst_entry can be used for many different
> neighbours we can not use it for pending confirmations.
> The flag is propagated from transport to every packet.
> It is reset when cached dst is reset.
> 
> Reported-by: YueHaibing <yuehaibing@huawei.com>
> Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.")
> Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.")
> Signed-off-by: Julian Anastasov <ja@ssi.bg>
> ---
>  include/net/sctp/sctp.h    |  6 ++----
>  include/net/sctp/structs.h |  4 ++++
>  net/sctp/associola.c       |  3 +--
>  net/sctp/output.c          | 10 +++++++++-
>  net/sctp/outqueue.c        |  2 +-
>  net/sctp/sm_make_chunk.c   |  6 ++----
>  net/sctp/sm_sideeffect.c   |  2 +-
>  net/sctp/socket.c          |  4 ++--
>  net/sctp/transport.c       | 18 +++++++++++++++++-
>  9 files changed, 39 insertions(+), 16 deletions(-)
> 
> diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
> index f0dcaeb..85d9083 100644
> --- a/include/net/sctp/sctp.h
> +++ b/include/net/sctp/sctp.h
> @@ -586,10 +586,8 @@ static inline void sctp_v4_map_v6(union sctp_addr *addr)
>   */
>  static inline struct dst_entry *sctp_transport_dst_check(struct sctp_transport *t)
>  {
> -	if (t->dst && !dst_check(t->dst, t->dst_cookie)) {
> -		dst_release(t->dst);
> -		t->dst = NULL;
> -	}
> +	if (t->dst && !dst_check(t->dst, t->dst_cookie))
> +		sctp_transport_dst_release(t);
>  
>  	return t->dst;
>  }
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index 92daabd..e842e84 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -838,6 +838,8 @@ struct sctp_transport {
>  
>  	__u32 burst_limited;	/* Holds old cwnd when max.burst is applied */
>  
> +	__u32 dst_pending_confirm;	/* need to confirm neighbour */
> +
>  	/* Destination */
>  	struct dst_entry *dst;
>  	/* Source address. */
> @@ -980,6 +982,8 @@ void sctp_transport_route(struct sctp_transport *, union sctp_addr *,
>  void sctp_transport_reset(struct sctp_transport *);
>  void sctp_transport_update_pmtu(struct sock *, struct sctp_transport *, u32);
>  void sctp_transport_immediate_rtx(struct sctp_transport *);
> +void sctp_transport_dst_release(struct sctp_transport *t);
> +void sctp_transport_dst_confirm(struct sctp_transport *t);
>  
>  
>  /* This is the structure we use to queue packets as they come into
> diff --git a/net/sctp/associola.c b/net/sctp/associola.c
> index 68428e1..7bd26e0 100644
> --- a/net/sctp/associola.c
> +++ b/net/sctp/associola.c
> @@ -820,8 +820,7 @@ void sctp_assoc_control_transport(struct sctp_association *asoc,
>  		if (transport->state != SCTP_UNCONFIRMED)
>  			transport->state = SCTP_INACTIVE;
>  		else {
> -			dst_release(transport->dst);
> -			transport->dst = NULL;
> +			sctp_transport_dst_release(transport);
>  			ulp_notify = false;
>  		}
>  
> diff --git a/net/sctp/output.c b/net/sctp/output.c
> index f5320a8..4684a00 100644
> --- a/net/sctp/output.c
> +++ b/net/sctp/output.c
> @@ -550,6 +550,7 @@ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t gfp)
>  	struct sctp_association *asoc = tp->asoc;
>  	struct sctp_chunk *chunk, *tmp;
>  	int pkt_count, gso = 0;
> +	int confirm;
>  	struct dst_entry *dst;
>  	struct sk_buff *head;
>  	struct sctphdr *sh;
> @@ -628,7 +629,14 @@ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t gfp)
>  			asoc->peer.last_sent_to = tp;
>  	}
>  	head->ignore_df = packet->ipfragok;
> -	tp->af_specific->sctp_xmit(head, tp);
> +	confirm = tp->dst_pending_confirm;
> +	if (confirm)
> +		head->dst_pending_confirm = 1;
Why not use your confirm helper function here?

> +	/* neighbour should be confirmed on successful transmission or
> +	 * positive error
> +	 */
> +	if (tp->af_specific->sctp_xmit(head, tp) >= 0 && confirm)
> +		tp->dst_pending_confirm = 0;
>  
>  out:
>  	list_for_each_entry_safe(chunk, tmp, &packet->chunk_list, list) {
> diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
> index e540826..8234799 100644
> --- a/net/sctp/outqueue.c
> +++ b/net/sctp/outqueue.c
> @@ -1641,7 +1641,7 @@ static void sctp_check_transmitted(struct sctp_outq *q,
>  
>  		if (forward_progress) {
>  			if (transport->dst)
> -				dst_confirm(transport->dst);
> +				sctp_transport_dst_confirm(transport);
>  		}
>  	}
>  
> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> index 9e9690b..6fb15bf 100644
> --- a/net/sctp/sm_make_chunk.c
> +++ b/net/sctp/sm_make_chunk.c
> @@ -3317,8 +3317,7 @@ static void sctp_asconf_param_success(struct sctp_association *asoc,
>  		local_bh_enable();
>  		list_for_each_entry(transport, &asoc->peer.transport_addr_list,
>  				transports) {
> -			dst_release(transport->dst);
> -			transport->dst = NULL;
> +			sctp_transport_dst_release(transport);
>  		}
>  		break;
>  	case SCTP_PARAM_DEL_IP:
> @@ -3332,8 +3331,7 @@ static void sctp_asconf_param_success(struct sctp_association *asoc,
>  		local_bh_enable();
>  		list_for_each_entry(transport, &asoc->peer.transport_addr_list,
>  				transports) {
> -			dst_release(transport->dst);
> -			transport->dst = NULL;
> +			sctp_transport_dst_release(transport);
>  		}
>  		break;
>  	default:
> diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
> index c345bf1..9255b29 100644
> --- a/net/sctp/sm_sideeffect.c
> +++ b/net/sctp/sm_sideeffect.c
> @@ -723,7 +723,7 @@ static void sctp_cmd_transport_on(sctp_cmd_seq_t *cmds,
>  	 * forward progress.
>  	 */
>  	if (t->dst)
> -		dst_confirm(t->dst);
> +		sctp_transport_dst_confirm(t);
>  
>  	/* The receiver of the HEARTBEAT ACK should also perform an
>  	 * RTT measurement for that destination transport address
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 318c678..9ee676a 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -588,7 +588,7 @@ static int sctp_send_asconf_add_ip(struct sock		*sk,
>  			list_for_each_entry(trans,
>  			    &asoc->peer.transport_addr_list, transports) {
>  				/* Clear the source and route cache */
> -				dst_release(trans->dst);
> +				sctp_transport_dst_release(trans);
>  				trans->cwnd = min(4*asoc->pathmtu, max_t(__u32,
>  				    2*asoc->pathmtu, 4380));
>  				trans->ssthresh = asoc->peer.i.a_rwnd;
> @@ -839,7 +839,7 @@ static int sctp_send_asconf_del_ip(struct sock		*sk,
>  		 */
>  		list_for_each_entry(transport, &asoc->peer.transport_addr_list,
>  					transports) {
> -			dst_release(transport->dst);
> +			sctp_transport_dst_release(transport);
>  			sctp_transport_route(transport, NULL,
>  					     sctp_sk(asoc->base.sk));
>  		}
> diff --git a/net/sctp/transport.c b/net/sctp/transport.c
> index ce54dce..db66514 100644
> --- a/net/sctp/transport.c
> +++ b/net/sctp/transport.c
> @@ -227,7 +227,7 @@ void sctp_transport_pmtu(struct sctp_transport *transport, struct sock *sk)
>  {
>  	/* If we don't have a fresh route, look one up */
>  	if (!transport->dst || transport->dst->obsolete) {
> -		dst_release(transport->dst);
> +		sctp_transport_dst_release(transport);
>  		transport->af_specific->get_dst(transport, &transport->saddr,
>  						&transport->fl, sk);
>  	}
> @@ -659,3 +659,19 @@ void sctp_transport_immediate_rtx(struct sctp_transport *t)
>  			sctp_transport_hold(t);
>  	}
>  }
> +
> +/* Drop dst */
> +void sctp_transport_dst_release(struct sctp_transport *t)
> +{
> +	dst_release(t->dst);
> +	t->dst = NULL;
> +	t->dst_pending_confirm = 0;
> +}
> +
> +/* Schedule neighbour confirm */
> +void sctp_transport_dst_confirm(struct sctp_transport *t)
> +{
> +	if (!t->dst_pending_confirm)
> +		t->dst_pending_confirm = 1;
I'm not sure why you check it for being zero before setting it here, just set it
unilaterally instead, or are you trying to avoid dirtying a cacheline?

Neil

> +}
> +
> -- 
> 1.9.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC net-next 3/7] sctp: add dst_pending_confirm flag
  2016-12-21 14:11     ` Neil Horman
@ 2016-12-21 20:19       ` Julian Anastasov
  -1 siblings, 0 replies; 34+ messages in thread
From: Julian Anastasov @ 2016-12-21 20:19 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev, linux-sctp, YueHaibing


	Hello,

On Wed, 21 Dec 2016, Neil Horman wrote:

> On Sun, Dec 18, 2016 at 10:56:32PM +0200, Julian Anastasov wrote:

> > -	tp->af_specific->sctp_xmit(head, tp);
> > +	confirm = tp->dst_pending_confirm;
> > +	if (confirm)
> > +		head->dst_pending_confirm = 1;
> Why not use your confirm helper function here?

	It is for sk, I can add a skb_set_dst_pending_confirm(skb,int)
helper, if needed.

> > +	if (!t->dst_pending_confirm)
> > +		t->dst_pending_confirm = 1;
> I'm not sure why you check it for being zero before setting it here, just set it
> unilaterally instead, or are you trying to avoid dirtying a cacheline?

	Yes, no other reason. I'll change it in the next version.

Regards

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC net-next 3/7] sctp: add dst_pending_confirm flag
@ 2016-12-21 20:19       ` Julian Anastasov
  0 siblings, 0 replies; 34+ messages in thread
From: Julian Anastasov @ 2016-12-21 20:19 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev, linux-sctp, YueHaibing


	Hello,

On Wed, 21 Dec 2016, Neil Horman wrote:

> On Sun, Dec 18, 2016 at 10:56:32PM +0200, Julian Anastasov wrote:

> > -	tp->af_specific->sctp_xmit(head, tp);
> > +	confirm = tp->dst_pending_confirm;
> > +	if (confirm)
> > +		head->dst_pending_confirm = 1;
> Why not use your confirm helper function here?

	It is for sk, I can add a skb_set_dst_pending_confirm(skb,int)
helper, if needed.

> > +	if (!t->dst_pending_confirm)
> > +		t->dst_pending_confirm = 1;
> I'm not sure why you check it for being zero before setting it here, just set it
> unilaterally instead, or are you trying to avoid dirtying a cacheline?

	Yes, no other reason. I'll change it in the next version.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC net-next 2/7] net: add dst_pending_confirm flag to skbuff
  2016-12-19 20:37       ` Julian Anastasov
@ 2017-01-20  3:39         ` YueHaibing
  -1 siblings, 0 replies; 34+ messages in thread
From: YueHaibing @ 2017-01-20  3:39 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: Eric Dumazet, netdev, linux-sctp


On 2016/12/20 4:37, Julian Anastasov wrote:
> 
> 	Hello,
> 
> On Mon, 19 Dec 2016, Eric Dumazet wrote:
> 
>> I am still digesting this awesome patch series ;)
> 
> 	Thanks. I don't feel quite comfortable with some
> of the changes (mostly XFRM, dst_confirm usage in CXGB) and
> I hope the discussion can provide adequate solution.
> 
>> Not sure why you used an unlikely() here. TCP for example would hit this
>> path quite often.
> 
> 	I was not sure, may be because ACKs can come with lower
> rate than the sent packets. Also because non-TCP rarely uses
> MSG_CONFIRM. If you still think it is better without unlikely,
> I'll remove it.
> 
>> So considering sk_dst_pending_confirm might be dirtied quite often,further 
>>
>> I am not sure why you placed it in the cache line that contains
>> sk_rx_dst (in 1st patch)
> 
> 	I saw your recent changes and was worried if the
> sk_dst_confirm() calling on RX can cause unwanted dirtying of
> additional cache line. My preliminary analyze pointed
> sk_omem_alloc as good candidate for moving to next cache
> line. I know how critical is to properly place the new flags,
> so I really need recommendations about this.
> 
> Regards
> 
> --
> Julian Anastasov <ja@ssi.bg>
> 
> .
> 

Sorry for so late reply.
I have test your new patch, It works well in my scene.

Everybody,Is there any further comment about this awesome patch series?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC net-next 2/7] net: add dst_pending_confirm flag to skbuff
@ 2017-01-20  3:39         ` YueHaibing
  0 siblings, 0 replies; 34+ messages in thread
From: YueHaibing @ 2017-01-20  3:39 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: Eric Dumazet, netdev, linux-sctp


On 2016/12/20 4:37, Julian Anastasov wrote:
> 
> 	Hello,
> 
> On Mon, 19 Dec 2016, Eric Dumazet wrote:
> 
>> I am still digesting this awesome patch series ;)
> 
> 	Thanks. I don't feel quite comfortable with some
> of the changes (mostly XFRM, dst_confirm usage in CXGB) and
> I hope the discussion can provide adequate solution.
> 
>> Not sure why you used an unlikely() here. TCP for example would hit this
>> path quite often.
> 
> 	I was not sure, may be because ACKs can come with lower
> rate than the sent packets. Also because non-TCP rarely uses
> MSG_CONFIRM. If you still think it is better without unlikely,
> I'll remove it.
> 
>> So considering sk_dst_pending_confirm might be dirtied quite often,further 
>>
>> I am not sure why you placed it in the cache line that contains
>> sk_rx_dst (in 1st patch)
> 
> 	I saw your recent changes and was worried if the
> sk_dst_confirm() calling on RX can cause unwanted dirtying of
> additional cache line. My preliminary analyze pointed
> sk_omem_alloc as good candidate for moving to next cache
> line. I know how critical is to properly place the new flags,
> so I really need recommendations about this.
> 
> Regards
> 
> --
> Julian Anastasov <ja@ssi.bg>
> 
> .
> 

Sorry for so late reply.
I have test your new patch, It works well in my scene.

Everybody,Is there any further comment about this awesome patch series?



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC net-next 2/7] net: add dst_pending_confirm flag to skbuff
  2017-01-20  3:39         ` YueHaibing
@ 2017-01-21 15:38           ` Julian Anastasov
  -1 siblings, 0 replies; 34+ messages in thread
From: Julian Anastasov @ 2017-01-21 15:38 UTC (permalink / raw)
  To: YueHaibing; +Cc: Eric Dumazet, netdev, linux-sctp


	Hello,

On Fri, 20 Jan 2017, YueHaibing wrote:

> Sorry for so late reply.
> I have test your new patch, It works well in my scene.

	Thanks! I'll prepare 2nd version in the following days...

Regards

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH RFC net-next 2/7] net: add dst_pending_confirm flag to skbuff
@ 2017-01-21 15:38           ` Julian Anastasov
  0 siblings, 0 replies; 34+ messages in thread
From: Julian Anastasov @ 2017-01-21 15:38 UTC (permalink / raw)
  To: YueHaibing; +Cc: Eric Dumazet, netdev, linux-sctp


	Hello,

On Fri, 20 Jan 2017, YueHaibing wrote:

> Sorry for so late reply.
> I have test your new patch, It works well in my scene.

	Thanks! I'll prepare 2nd version in the following days...

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2017-01-21 15:38 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-18 20:56 [PATCH RFC net-next 0/7] net: dst_confirm replacement Julian Anastasov
2016-12-18 20:56 ` Julian Anastasov
2016-12-18 20:56 ` [PATCH RFC net-next 1/7] sock: add sk_dst_pending_confirm flag Julian Anastasov
2016-12-18 20:56   ` Julian Anastasov
2016-12-18 20:56 ` [PATCH RFC net-next 2/7] net: add dst_pending_confirm flag to skbuff Julian Anastasov
2016-12-18 20:56   ` Julian Anastasov
2016-12-19 16:17   ` Eric Dumazet
2016-12-19 16:17     ` Eric Dumazet
2016-12-19 16:36     ` Hannes Frederic Sowa
2016-12-19 16:36       ` Hannes Frederic Sowa
2016-12-19 16:40       ` Eric Dumazet
2016-12-19 16:40         ` Eric Dumazet
2016-12-19 16:53         ` Hannes Frederic Sowa
2016-12-19 16:53           ` Hannes Frederic Sowa
2016-12-19 20:37     ` Julian Anastasov
2016-12-19 20:37       ` Julian Anastasov
2017-01-20  3:39       ` YueHaibing
2017-01-20  3:39         ` YueHaibing
2017-01-21 15:38         ` Julian Anastasov
2017-01-21 15:38           ` Julian Anastasov
2016-12-18 20:56 ` [PATCH RFC net-next 3/7] sctp: add dst_pending_confirm flag Julian Anastasov
2016-12-18 20:56   ` Julian Anastasov
2016-12-21 14:11   ` Neil Horman
2016-12-21 14:11     ` Neil Horman
2016-12-21 20:19     ` Julian Anastasov
2016-12-21 20:19       ` Julian Anastasov
2016-12-18 20:56 ` [PATCH RFC net-next 4/7] tcp: replace dst_confirm with sk_dst_confirm Julian Anastasov
2016-12-18 20:56   ` Julian Anastasov
2016-12-18 20:56 ` [PATCH RFC net-next 5/7] net: add confirm_neigh method to dst_ops Julian Anastasov
2016-12-18 20:56   ` Julian Anastasov
2016-12-18 20:56 ` [PATCH RFC net-next 6/7] net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP Julian Anastasov
2016-12-18 20:56   ` Julian Anastasov
2016-12-18 20:56 ` [PATCH RFC net-next 7/7] net: pending_confirm is not used anymore Julian Anastasov
2016-12-18 20:56   ` Julian Anastasov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.