All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v3 0/3] udp: reduce cache pressure
@ 2017-06-12  9:23 Paolo Abeni
  2017-06-12  9:23 ` [PATCH net-next v3 1/3] net: factor out a helper to decrement the skb refcount Paolo Abeni
                   ` (3 more replies)
  0 siblings, 4 replies; 29+ messages in thread
From: Paolo Abeni @ 2017-06-12  9:23 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Eric Dumazet

In the most common use case, many skb fields are not used by recvmsg(), and
the few ones actually accessed lays on cold cachelines, which leads to several
cache miss per packet.

This patch series attempts to reduce such misses with different strategies:
* caching the interesting fields in the scratched space
* avoid accessing at all uninteresting fields
* prefetching

Tested using the udp_sink program by Jesper[1] as the receiver, an h/w l4 rx
hash on the ingress nic, so that the number of ingress nic rx queues hit by the
udp traffic could be controlled via ethtool -L.

The udp_sink program was bound to the first idle cpu, to get more
stable numbers.

On a single numa node receiver:

nic rx queues           vanilla                 patched kernel      delta
1                       1850 kpps               1850 kpps           0%
2                       2370 kpps               2700 kpps           13.9%
16                      2000 kpps               2220 kpps           11%

[1] https://github.com/netoptimizer/network-testing/blob/master/src/udp_sink.c

v1 -> v2:
  - replaced secpath_reset() with skb_release_head_state()
  - changed udp_dev_scratch fields types to u{32,16} variant,
    replaced bitfield with bool

v2 -> v3:
  - no changes, tested against apachebench for performances regression

Paolo Abeni (3):
  net: factor out a helper to decrement the skb refcount
  udp: avoid a cache miss on dequeue
  udp: try to avoid 2 cache miss on dequeue

 include/linux/skbuff.h |  15 +++++++
 net/core/datagram.c    |   4 +-
 net/core/skbuff.c      |  38 ++++++++++------
 net/ipv4/udp.c         | 120 ++++++++++++++++++++++++++++++++++++++++++++-----
 4 files changed, 148 insertions(+), 29 deletions(-)

-- 
2.9.4

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH net-next v3 1/3] net: factor out a helper to decrement the skb refcount
  2017-06-12  9:23 [PATCH net-next v3 0/3] udp: reduce cache pressure Paolo Abeni
@ 2017-06-12  9:23 ` Paolo Abeni
  2017-06-12  9:23 ` [PATCH net-next v3 2/3] udp: avoid a cache miss on dequeue Paolo Abeni
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 29+ messages in thread
From: Paolo Abeni @ 2017-06-12  9:23 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Eric Dumazet

The same code is replicated in 3 different places; move it to a
common helper.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/skbuff.h | 13 +++++++++++++
 net/core/datagram.c    |  4 +---
 net/core/skbuff.c      | 14 ++++----------
 3 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index d460a4c..decce36 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -867,6 +867,19 @@ static inline unsigned int skb_napi_id(const struct sk_buff *skb)
 #endif
 }
 
+/* decrement the reference count and return true if we can free the skb */
+static inline bool skb_unref(struct sk_buff *skb)
+{
+	if (unlikely(!skb))
+		return false;
+	if (likely(atomic_read(&skb->users) == 1))
+		smp_rmb();
+	else if (likely(!atomic_dec_and_test(&skb->users)))
+		return false;
+
+	return true;
+}
+
 void kfree_skb(struct sk_buff *skb);
 void kfree_skb_list(struct sk_buff *segs);
 void skb_tx_error(struct sk_buff *skb);
diff --git a/net/core/datagram.c b/net/core/datagram.c
index bc46118..e5311a7 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -330,9 +330,7 @@ void __skb_free_datagram_locked(struct sock *sk, struct sk_buff *skb, int len)
 {
 	bool slow;
 
-	if (likely(atomic_read(&skb->users) == 1))
-		smp_rmb();
-	else if (likely(!atomic_dec_and_test(&skb->users))) {
+	if (!skb_unref(skb)) {
 		sk_peek_offset_bwd(sk, len);
 		return;
 	}
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index e508c1e..747263c 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -694,12 +694,9 @@ EXPORT_SYMBOL(__kfree_skb);
  */
 void kfree_skb(struct sk_buff *skb)
 {
-	if (unlikely(!skb))
-		return;
-	if (likely(atomic_read(&skb->users) == 1))
-		smp_rmb();
-	else if (likely(!atomic_dec_and_test(&skb->users)))
+	if (!skb_unref(skb))
 		return;
+
 	trace_kfree_skb(skb, __builtin_return_address(0));
 	__kfree_skb(skb);
 }
@@ -746,12 +743,9 @@ EXPORT_SYMBOL(skb_tx_error);
  */
 void consume_skb(struct sk_buff *skb)
 {
-	if (unlikely(!skb))
-		return;
-	if (likely(atomic_read(&skb->users) == 1))
-		smp_rmb();
-	else if (likely(!atomic_dec_and_test(&skb->users)))
+	if (!skb_unref(skb))
 		return;
+
 	trace_consume_skb(skb);
 	__kfree_skb(skb);
 }
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH net-next v3 2/3] udp: avoid a cache miss on dequeue
  2017-06-12  9:23 [PATCH net-next v3 0/3] udp: reduce cache pressure Paolo Abeni
  2017-06-12  9:23 ` [PATCH net-next v3 1/3] net: factor out a helper to decrement the skb refcount Paolo Abeni
@ 2017-06-12  9:23 ` Paolo Abeni
  2017-07-17  7:04   ` Eric Dumazet
  2017-06-12  9:23 ` [PATCH net-next v3 3/3] udp: try to avoid 2 cache miss on dequeue Paolo Abeni
  2017-06-12 14:01 ` [PATCH net-next v3 0/3] udp: reduce cache pressure David Miller
  3 siblings, 1 reply; 29+ messages in thread
From: Paolo Abeni @ 2017-06-12  9:23 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Eric Dumazet

Since UDP no more uses sk->destructor, we can clear completely
the skb head state before enqueuing. Amend and use
skb_release_head_state() for that.

All head states share a single cacheline, which is not
normally used/accesses on dequeue. We can avoid entirely accessing
such cacheline implementing and using in the UDP code a specialized
skb free helper which ignores the skb head state.

This saves a cacheline miss at skb deallocation time.

v1 -> v2:
  replaced secpath_reset() with skb_release_head_state()

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/skbuff.h |  2 ++
 net/core/skbuff.c      | 24 ++++++++++++++++++++----
 net/ipv4/udp.c         |  6 +++++-
 3 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index decce36..d66d4fe 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -880,10 +880,12 @@ static inline bool skb_unref(struct sk_buff *skb)
 	return true;
 }
 
+void skb_release_head_state(struct sk_buff *skb);
 void kfree_skb(struct sk_buff *skb);
 void kfree_skb_list(struct sk_buff *segs);
 void skb_tx_error(struct sk_buff *skb);
 void consume_skb(struct sk_buff *skb);
+void consume_stateless_skb(struct sk_buff *skb);
 void  __kfree_skb(struct sk_buff *skb);
 extern struct kmem_cache *skbuff_head_cache;
 
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 747263c..3046027 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -643,12 +643,10 @@ static void kfree_skbmem(struct sk_buff *skb)
 	kmem_cache_free(skbuff_fclone_cache, fclones);
 }
 
-static void skb_release_head_state(struct sk_buff *skb)
+void skb_release_head_state(struct sk_buff *skb)
 {
 	skb_dst_drop(skb);
-#ifdef CONFIG_XFRM
-	secpath_put(skb->sp);
-#endif
+	secpath_reset(skb);
 	if (skb->destructor) {
 		WARN_ON(in_irq());
 		skb->destructor(skb);
@@ -751,6 +749,24 @@ void consume_skb(struct sk_buff *skb)
 }
 EXPORT_SYMBOL(consume_skb);
 
+/**
+ *	consume_stateless_skb - free an skbuff, assuming it is stateless
+ *	@skb: buffer to free
+ *
+ *	Works like consume_skb(), but this variant assumes that all the head
+ *	states have been already dropped.
+ */
+void consume_stateless_skb(struct sk_buff *skb)
+{
+	if (!skb_unref(skb))
+		return;
+
+	trace_consume_skb(skb);
+	if (likely(skb->head))
+		skb_release_data(skb);
+	kfree_skbmem(skb);
+}
+
 void __kfree_skb_flush(void)
 {
 	struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index fdcb743..d8b265f 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1359,7 +1359,8 @@ void skb_consume_udp(struct sock *sk, struct sk_buff *skb, int len)
 		sk_peek_offset_bwd(sk, len);
 		unlock_sock_fast(sk, slow);
 	}
-	consume_skb(skb);
+
+	consume_stateless_skb(skb);
 }
 EXPORT_SYMBOL_GPL(skb_consume_udp);
 
@@ -1739,6 +1740,9 @@ static int __udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 		sk_mark_napi_id_once(sk, skb);
 	}
 
+	/* clear all pending head states while they are hot in the cache */
+	skb_release_head_state(skb);
+
 	rc = __udp_enqueue_schedule_skb(sk, skb);
 	if (rc < 0) {
 		int is_udplite = IS_UDPLITE(sk);
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH net-next v3 3/3] udp: try to avoid 2 cache miss on dequeue
  2017-06-12  9:23 [PATCH net-next v3 0/3] udp: reduce cache pressure Paolo Abeni
  2017-06-12  9:23 ` [PATCH net-next v3 1/3] net: factor out a helper to decrement the skb refcount Paolo Abeni
  2017-06-12  9:23 ` [PATCH net-next v3 2/3] udp: avoid a cache miss on dequeue Paolo Abeni
@ 2017-06-12  9:23 ` Paolo Abeni
  2017-06-22 13:06     ` Michael Ellerman
  2017-06-12 14:01 ` [PATCH net-next v3 0/3] udp: reduce cache pressure David Miller
  3 siblings, 1 reply; 29+ messages in thread
From: Paolo Abeni @ 2017-06-12  9:23 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Eric Dumazet

when udp_recvmsg() is executed, on x86_64 and other archs, most skb
fields are on cold cachelines.
If the skb are linear and the kernel don't need to compute the udp
csum, only a handful of skb fields are required by udp_recvmsg().
Since we already use skb->dev_scratch to cache hot data, and
there are 32 bits unused on 64 bit archs, use such field to cache
as much data as we can, and try to prefetch on dequeue the relevant
fields that are left out.

This can save up to 2 cache miss per packet.

v1 -> v2:
  - changed udp_dev_scratch fields types to u{32,16} variant,
    replaced bitfiled with bool

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/udp.c | 114 +++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 103 insertions(+), 11 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index d8b265f..2bc638c 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1163,6 +1163,83 @@ int udp_sendpage(struct sock *sk, struct page *page, int offset,
 	return ret;
 }
 
+/* Copy as much information as possible into skb->dev_scratch to avoid
+ * possibly multiple cache miss on dequeue();
+ */
+#if BITS_PER_LONG == 64
+
+/* we can store multiple info here: truesize, len and the bit needed to
+ * compute skb_csum_unnecessary will be on cold cache lines at recvmsg
+ * time.
+ * skb->len can be stored on 16 bits since the udp header has been already
+ * validated and pulled.
+ */
+struct udp_dev_scratch {
+	u32 truesize;
+	u16 len;
+	bool is_linear;
+	bool csum_unnecessary;
+};
+
+static void udp_set_dev_scratch(struct sk_buff *skb)
+{
+	struct udp_dev_scratch *scratch;
+
+	BUILD_BUG_ON(sizeof(struct udp_dev_scratch) > sizeof(long));
+	scratch = (struct udp_dev_scratch *)&skb->dev_scratch;
+	scratch->truesize = skb->truesize;
+	scratch->len = skb->len;
+	scratch->csum_unnecessary = !!skb_csum_unnecessary(skb);
+	scratch->is_linear = !skb_is_nonlinear(skb);
+}
+
+static int udp_skb_truesize(struct sk_buff *skb)
+{
+	return ((struct udp_dev_scratch *)&skb->dev_scratch)->truesize;
+}
+
+static unsigned int udp_skb_len(struct sk_buff *skb)
+{
+	return ((struct udp_dev_scratch *)&skb->dev_scratch)->len;
+}
+
+static bool udp_skb_csum_unnecessary(struct sk_buff *skb)
+{
+	return ((struct udp_dev_scratch *)&skb->dev_scratch)->csum_unnecessary;
+}
+
+static bool udp_skb_is_linear(struct sk_buff *skb)
+{
+	return ((struct udp_dev_scratch *)&skb->dev_scratch)->is_linear;
+}
+
+#else
+static void udp_set_dev_scratch(struct sk_buff *skb)
+{
+	skb->dev_scratch = skb->truesize;
+}
+
+static int udp_skb_truesize(struct sk_buff *skb)
+{
+	return skb->dev_scratch;
+}
+
+static unsigned int udp_skb_len(struct sk_buff *skb)
+{
+	return skb->len;
+}
+
+static bool udp_skb_csum_unnecessary(struct sk_buff *skb)
+{
+	return skb_csum_unnecessary(skb);
+}
+
+static bool udp_skb_is_linear(struct sk_buff *skb)
+{
+	return !skb_is_nonlinear(skb);
+}
+#endif
+
 /* fully reclaim rmem/fwd memory allocated for skb */
 static void udp_rmem_release(struct sock *sk, int size, int partial,
 			     bool rx_queue_lock_held)
@@ -1213,14 +1290,16 @@ static void udp_rmem_release(struct sock *sk, int size, int partial,
  */
 void udp_skb_destructor(struct sock *sk, struct sk_buff *skb)
 {
-	udp_rmem_release(sk, skb->dev_scratch, 1, false);
+	prefetch(&skb->data);
+	udp_rmem_release(sk, udp_skb_truesize(skb), 1, false);
 }
 EXPORT_SYMBOL(udp_skb_destructor);
 
 /* as above, but the caller held the rx queue lock, too */
 static void udp_skb_dtor_locked(struct sock *sk, struct sk_buff *skb)
 {
-	udp_rmem_release(sk, skb->dev_scratch, 1, true);
+	prefetch(&skb->data);
+	udp_rmem_release(sk, udp_skb_truesize(skb), 1, true);
 }
 
 /* Idea of busylocks is to let producers grab an extra spinlock
@@ -1274,10 +1353,7 @@ int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb)
 		busy = busylock_acquire(sk);
 	}
 	size = skb->truesize;
-	/* Copy skb->truesize into skb->dev_scratch to avoid a cache line miss
-	 * in udp_skb_destructor()
-	 */
-	skb->dev_scratch = size;
+	udp_set_dev_scratch(skb);
 
 	/* we drop only if the receive buf is full and the receive
 	 * queue contains some other skb
@@ -1515,6 +1591,18 @@ struct sk_buff *__skb_recv_udp(struct sock *sk, unsigned int flags,
 }
 EXPORT_SYMBOL_GPL(__skb_recv_udp);
 
+static int copy_linear_skb(struct sk_buff *skb, int len, int off,
+			   struct iov_iter *to)
+{
+	int n, copy = len - off;
+
+	n = copy_to_iter(skb->data + off, copy, to);
+	if (n == copy)
+		return 0;
+
+	return -EFAULT;
+}
+
 /*
  * 	This should be easy, if there is something there we
  * 	return it, otherwise we block.
@@ -1541,7 +1629,7 @@ int udp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int noblock,
 	if (!skb)
 		return err;
 
-	ulen = skb->len;
+	ulen = udp_skb_len(skb);
 	copied = len;
 	if (copied > ulen - off)
 		copied = ulen - off;
@@ -1556,14 +1644,18 @@ int udp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int noblock,
 
 	if (copied < ulen || peeking ||
 	    (is_udplite && UDP_SKB_CB(skb)->partial_cov)) {
-		checksum_valid = !udp_lib_checksum_complete(skb);
+		checksum_valid = udp_skb_csum_unnecessary(skb) ||
+				!__udp_lib_checksum_complete(skb);
 		if (!checksum_valid)
 			goto csum_copy_err;
 	}
 
-	if (checksum_valid || skb_csum_unnecessary(skb))
-		err = skb_copy_datagram_msg(skb, off, msg, copied);
-	else {
+	if (checksum_valid || udp_skb_csum_unnecessary(skb)) {
+		if (udp_skb_is_linear(skb))
+			err = copy_linear_skb(skb, copied, off, &msg->msg_iter);
+		else
+			err = skb_copy_datagram_msg(skb, off, msg, copied);
+	} else {
 		err = skb_copy_and_csum_datagram_msg(skb, off, msg);
 
 		if (err == -EINVAL)
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next v3 0/3] udp: reduce cache pressure
  2017-06-12  9:23 [PATCH net-next v3 0/3] udp: reduce cache pressure Paolo Abeni
                   ` (2 preceding siblings ...)
  2017-06-12  9:23 ` [PATCH net-next v3 3/3] udp: try to avoid 2 cache miss on dequeue Paolo Abeni
@ 2017-06-12 14:01 ` David Miller
  3 siblings, 0 replies; 29+ messages in thread
From: David Miller @ 2017-06-12 14:01 UTC (permalink / raw)
  To: pabeni; +Cc: netdev, edumazet

From: Paolo Abeni <pabeni@redhat.com>
Date: Mon, 12 Jun 2017 11:23:40 +0200

> In the most common use case, many skb fields are not used by recvmsg(), and
> the few ones actually accessed lays on cold cachelines, which leads to several
> cache miss per packet.
> 
> This patch series attempts to reduce such misses with different strategies:
> * caching the interesting fields in the scratched space
> * avoid accessing at all uninteresting fields
> * prefetching
> 
> Tested using the udp_sink program by Jesper[1] as the receiver, an h/w l4 rx
> hash on the ingress nic, so that the number of ingress nic rx queues hit by the
> udp traffic could be controlled via ethtool -L.
> 
> The udp_sink program was bound to the first idle cpu, to get more
> stable numbers.
> 
> On a single numa node receiver:
> 
> nic rx queues           vanilla                 patched kernel      delta
> 1                       1850 kpps               1850 kpps           0%
> 2                       2370 kpps               2700 kpps           13.9%
> 16                      2000 kpps               2220 kpps           11%
> 
> [1] https://github.com/netoptimizer/network-testing/blob/master/src/udp_sink.c

Series applied, thank you.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* DNS (?) not working on G5 (64-bit powerpc) (was [net-next, v3, 3/3] udp: try to avoid 2 cache miss on dequeue)
  2017-06-12  9:23 ` [PATCH net-next v3 3/3] udp: try to avoid 2 cache miss on dequeue Paolo Abeni
@ 2017-06-22 13:06     ` Michael Ellerman
  0 siblings, 0 replies; 29+ messages in thread
From: Michael Ellerman @ 2017-06-22 13:06 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Eric Dumazet, linux-next, linuxppc-dev, David S. Miller

Paolo wrote:
> when udp_recvmsg() is executed, on x86_64 and other archs, most skb
> fields are on cold cachelines.
> If the skb are linear and the kernel don't need to compute the udp
> csum, only a handful of skb fields are required by udp_recvmsg().
> Since we already use skb->dev_scratch to cache hot data, and
> there are 32 bits unused on 64 bit archs, use such field to cache
> as much data as we can, and try to prefetch on dequeue the relevant
> fields that are left out.
> 
> This can save up to 2 cache miss per packet.
> 
> v1 -> v2:
>   - changed udp_dev_scratch fields types to u{32,16} variant,
>     replaced bitfiled with bool
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> Acked-by: Eric Dumazet <edumazet@google.com>
> ---
>  net/ipv4/udp.c | 114 +++++++++++++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 103 insertions(+), 11 deletions(-)

This appears to break wget on one of my machines.

Networking in general is working, I'm able to SSH in, but then I can't
do a wget.

eg:

$ wget google.com
--2017-06-22 22:45:39--  http://google.com/
Resolving proxy.pmdw.com (proxy.pmdw.com)... failed: Temporary failure in name resolution.
wget: unable to resolve host address ‘proxy.pmdw.com’

$ host proxy.pmdw.com
proxy.pmdw.com is an alias for raven.pmdw.com.
raven.pmdw.com has address 10.1.2.3

$ wget google.com
--2017-06-22 22:52:08--  http://google.com/
Resolving proxy.pmdw.com (proxy.pmdw.com)... failed: Temporary failure in name resolution.
wget: unable to resolve host address ‘proxy.pmdw.com’

Maybe host is using TCP but the man page says it doesn't?


Everything is OK if I boot back to the previous commit 0a463c78d25b
("udp: avoid a cache miss on dequeue"):

$ wget google.com
--2017-06-22 23:00:01--  http://google.com/
Resolving proxy.pmdw.com (proxy.pmdw.com)... 10.1.2.3
Connecting to proxy.pmdw.com (proxy.pmdw.com)|10.1.2.3|:3128... connected.
Proxy request sent, awaiting response... 302 Found
Location: http://www.google.com.au/?gfe_rd=cr&ei=Ub9LWbPbLujDXrH1uPgE [following]
--2017-06-22 23:00:01--  http://www.google.com.au/?gfe_rd=cr&ei=Ub9LWbPbLujDXrH1uPgE
Reusing existing connection to proxy.pmdw.com:3128.
Proxy request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘index.html’

index.html                      [ <=>                                       ]  11.37K  --.-KB/s    in 0.001s  

2017-06-22 23:00:01 (22.0 MB/s) - ‘index.html’ saved [11640]

$ uname -a
Linux 4.12.0-rc4-gcc6-00988-g0a463c7 #88 SMP Thu Jun 22 22:55:12 AEST 2017 ppc64 GNU/Linux


Haven't had time to debug any further. Any ideas?

cheers

^ permalink raw reply	[flat|nested] 29+ messages in thread

* DNS (?) not working on G5 (64-bit powerpc) (was [net-next, v3, 3/3] udp: try to avoid 2 cache miss on dequeue)
@ 2017-06-22 13:06     ` Michael Ellerman
  0 siblings, 0 replies; 29+ messages in thread
From: Michael Ellerman @ 2017-06-22 13:06 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: David S. Miller, Eric Dumazet, netdev, linuxppc-dev

Paolo wrote:
> when udp_recvmsg() is executed, on x86_64 and other archs, most skb
> fields are on cold cachelines.
> If the skb are linear and the kernel don't need to compute the udp
> csum, only a handful of skb fields are required by udp_recvmsg().
> Since we already use skb->dev_scratch to cache hot data, and
> there are 32 bits unused on 64 bit archs, use such field to cache
> as much data as we can, and try to prefetch on dequeue the relevant
> fields that are left out.
>=20
> This can save up to 2 cache miss per packet.
>=20
> v1 -> v2:
>   - changed udp_dev_scratch fields types to u{32,16} variant,
>     replaced bitfiled with bool
>=20
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> Acked-by: Eric Dumazet <edumazet@google.com>
> ---
>  net/ipv4/udp.c | 114 +++++++++++++++++++++++++++++++++++++++++++++++++++=
------
>  1 file changed, 103 insertions(+), 11 deletions(-)

This appears to break wget on one of my machines.

Networking in general is working, I'm able to SSH in, but then I can't
do a wget.

eg:

$ wget google.com
--2017-06-22 22:45:39--  http://google.com/
Resolving proxy.pmdw.com (proxy.pmdw.com)... failed: Temporary failure in n=
ame resolution.
wget: unable to resolve host address =E2=80=98proxy.pmdw.com=E2=80=99

$ host proxy.pmdw.com
proxy.pmdw.com is an alias for raven.pmdw.com.
raven.pmdw.com has address 10.1.2.3

$ wget google.com
--2017-06-22 22:52:08--  http://google.com/
Resolving proxy.pmdw.com (proxy.pmdw.com)... failed: Temporary failure in n=
ame resolution.
wget: unable to resolve host address =E2=80=98proxy.pmdw.com=E2=80=99

Maybe host is using TCP but the man page says it doesn't?


Everything is OK if I boot back to the previous commit 0a463c78d25b
("udp: avoid a cache miss on dequeue"):

$ wget google.com
--2017-06-22 23:00:01--  http://google.com/
Resolving proxy.pmdw.com (proxy.pmdw.com)... 10.1.2.3
Connecting to proxy.pmdw.com (proxy.pmdw.com)|10.1.2.3|:3128... connected.
Proxy request sent, awaiting response... 302 Found
Location: http://www.google.com.au/?gfe_rd=3Dcr&ei=3DUb9LWbPbLujDXrH1uPgE [=
following]
--2017-06-22 23:00:01--  http://www.google.com.au/?gfe_rd=3Dcr&ei=3DUb9LWbP=
bLujDXrH1uPgE
Reusing existing connection to proxy.pmdw.com:3128.
Proxy request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: =E2=80=98index.html=E2=80=99

index.html                      [ <=3D>                                    =
   ]  11.37K  --.-KB/s    in 0.001s=20=20

2017-06-22 23:00:01 (22.0 MB/s) - =E2=80=98index.html=E2=80=99 saved [11640]

$ uname -a
Linux 4.12.0-rc4-gcc6-00988-g0a463c7 #88 SMP Thu Jun 22 22:55:12 AEST 2017 =
ppc64 GNU/Linux


Haven't had time to debug any further. Any ideas?

cheers

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: DNS (?) not working on G5 (64-bit powerpc) (was [net-next,v3,3/3] udp: try to avoid 2 cache miss on dequeue)
  2017-06-22 13:06     ` Michael Ellerman
@ 2017-06-22 16:43       ` Paolo Abeni
  -1 siblings, 0 replies; 29+ messages in thread
From: Paolo Abeni @ 2017-06-22 16:43 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: David S. Miller, Eric Dumazet, netdev, linuxppc-dev

On Thu, 2017-06-22 at 23:06 +1000, Michael Ellerman wrote:
> Paolo wrote:
> > when udp_recvmsg() is executed, on x86_64 and other archs, most skb
> > fields are on cold cachelines.
> > If the skb are linear and the kernel don't need to compute the udp
> > csum, only a handful of skb fields are required by udp_recvmsg().
> > Since we already use skb->dev_scratch to cache hot data, and
> > there are 32 bits unused on 64 bit archs, use such field to cache
> > as much data as we can, and try to prefetch on dequeue the relevant
> > fields that are left out.
> > 
> > This can save up to 2 cache miss per packet.
> > 
> > v1 -> v2:
> >   - changed udp_dev_scratch fields types to u{32,16} variant,
> >     replaced bitfiled with bool
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > Acked-by: Eric Dumazet <edumazet@google.com>
> > ---
> >  net/ipv4/udp.c | 114 +++++++++++++++++++++++++++++++++++++++++++++++++++------
> >  1 file changed, 103 insertions(+), 11 deletions(-)
> 
> This appears to break wget on one of my machines.
> 
> Networking in general is working, I'm able to SSH in, but then I can't
> do a wget.
> 
> eg:
> 
> $ wget google.com
> --2017-06-22 22:45:39--  http://google.com/
> Resolving proxy.pmdw.com (proxy.pmdw.com)... failed: Temporary failure in name resolution.
> wget: unable to resolve host address ‘proxy.pmdw.com’
> 
> $ host proxy.pmdw.com
> proxy.pmdw.com is an alias for raven.pmdw.com.
> raven.pmdw.com has address 10.1.2.3
> 
> $ wget google.com
> --2017-06-22 22:52:08--  http://google.com/
> Resolving proxy.pmdw.com (proxy.pmdw.com)... failed: Temporary failure in name resolution.
> wget: unable to resolve host address ‘proxy.pmdw.com’
> 
> Maybe host is using TCP but the man page says it doesn't?
> 
> 
> Everything is OK if I boot back to the previous commit 0a463c78d25b
> ("udp: avoid a cache miss on dequeue"):
> 
> $ wget google.com
> --2017-06-22 23:00:01--  http://google.com/
> Resolving proxy.pmdw.com (proxy.pmdw.com)... 10.1.2.3
> Connecting to proxy.pmdw.com (proxy.pmdw.com)|10.1.2.3|:3128... connected.
> Proxy request sent, awaiting response... 302 Found
> Location: http://www.google.com.au/?gfe_rd=cr&ei=Ub9LWbPbLujDXrH1uPgE [following]
> --2017-06-22 23:00:01--  http://www.google.com.au/?gfe_rd=cr&ei=Ub9LWbPbLujDXrH1uPgE
> Reusing existing connection to proxy.pmdw.com:3128.
> Proxy request sent, awaiting response... 200 OK
> Length: unspecified [text/html]
> Saving to: ‘index.html’
> 
> index.html                      [ <=>                                       ]  11.37K  --.-KB/s    in 0.001s  
> 
> 2017-06-22 23:00:01 (22.0 MB/s) - ‘index.html’ saved [11640]
> 
> $ uname -a
> Linux 4.12.0-rc4-gcc6-00988-g0a463c7 #88 SMP Thu Jun 22 22:55:12 AEST 2017 ppc64 GNU/Linux
> 
> 
> Haven't had time to debug any further. Any ideas?

Thank you for this report.

Can you please specify features of the relevant NIC ? (ethtool -k
<name>) 

I'll try to replicate the issue as soon I'll get hands on suitable HW,
meanwhile can you please try to trace the system behavior with perf?

Something like:

perf probe -a __udp_enqueue_schedule_skb
perf probe -a udp_recvmsg
perf probe -a udpv6_recvmsg
perf record -e probe:__udp_enqueue_schedule_skb -e probe:udp_recvmsg -e probe:udpv6_recvmsg -ag wget google.com
perf report --stdio

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: DNS (?) not working on G5 (64-bit powerpc) (was [net-next,v3,3/3] udp: try to avoid 2 cache miss on dequeue)
@ 2017-06-22 16:43       ` Paolo Abeni
  0 siblings, 0 replies; 29+ messages in thread
From: Paolo Abeni @ 2017-06-22 16:43 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: David S. Miller, Eric Dumazet, netdev, linuxppc-dev

On Thu, 2017-06-22 at 23:06 +1000, Michael Ellerman wrote:
> Paolo wrote:
> > when udp_recvmsg() is executed, on x86_64 and other archs, most skb
> > fields are on cold cachelines.
> > If the skb are linear and the kernel don't need to compute the udp
> > csum, only a handful of skb fields are required by udp_recvmsg().
> > Since we already use skb->dev_scratch to cache hot data, and
> > there are 32 bits unused on 64 bit archs, use such field to cache
> > as much data as we can, and try to prefetch on dequeue the relevant
> > fields that are left out.
> > 
> > This can save up to 2 cache miss per packet.
> > 
> > v1 -> v2:
> >   - changed udp_dev_scratch fields types to u{32,16} variant,
> >     replaced bitfiled with bool
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > Acked-by: Eric Dumazet <edumazet@google.com>
> > ---
> >  net/ipv4/udp.c | 114 +++++++++++++++++++++++++++++++++++++++++++++++++++------
> >  1 file changed, 103 insertions(+), 11 deletions(-)
> 
> This appears to break wget on one of my machines.
> 
> Networking in general is working, I'm able to SSH in, but then I can't
> do a wget.
> 
> eg:
> 
> $ wget google.com
> --2017-06-22 22:45:39--  http://google.com/
> Resolving proxy.pmdw.com (proxy.pmdw.com)... failed: Temporary failure in name resolution.
> wget: unable to resolve host address ‘proxy.pmdw.com’
> 
> $ host proxy.pmdw.com
> proxy.pmdw.com is an alias for raven.pmdw.com.
> raven.pmdw.com has address 10.1.2.3
> 
> $ wget google.com
> --2017-06-22 22:52:08--  http://google.com/
> Resolving proxy.pmdw.com (proxy.pmdw.com)... failed: Temporary failure in name resolution.
> wget: unable to resolve host address ‘proxy.pmdw.com’
> 
> Maybe host is using TCP but the man page says it doesn't?
> 
> 
> Everything is OK if I boot back to the previous commit 0a463c78d25b
> ("udp: avoid a cache miss on dequeue"):
> 
> $ wget google.com
> --2017-06-22 23:00:01--  http://google.com/
> Resolving proxy.pmdw.com (proxy.pmdw.com)... 10.1.2.3
> Connecting to proxy.pmdw.com (proxy.pmdw.com)|10.1.2.3|:3128... connected.
> Proxy request sent, awaiting response... 302 Found
> Location: http://www.google.com.au/?gfe_rd=cr&ei=Ub9LWbPbLujDXrH1uPgE [following]
> --2017-06-22 23:00:01--  http://www.google.com.au/?gfe_rd=cr&ei=Ub9LWbPbLujDXrH1uPgE
> Reusing existing connection to proxy.pmdw.com:3128.
> Proxy request sent, awaiting response... 200 OK
> Length: unspecified [text/html]
> Saving to: ‘index.html’
> 
> index.html                      [ <=>                                       ]  11.37K  --.-KB/s    in 0.001s  
> 
> 2017-06-22 23:00:01 (22.0 MB/s) - ‘index.html’ saved [11640]
> 
> $ uname -a
> Linux 4.12.0-rc4-gcc6-00988-g0a463c7 #88 SMP Thu Jun 22 22:55:12 AEST 2017 ppc64 GNU/Linux
> 
> 
> Haven't had time to debug any further. Any ideas?

Thank you for this report.

Can you please specify features of the relevant NIC ? (ethtool -k
<name>) 

I'll try to replicate the issue as soon I'll get hands on suitable HW,
meanwhile can you please try to trace the system behavior with perf?

Something like:

perf probe -a __udp_enqueue_schedule_skb
perf probe -a udp_recvmsg
perf probe -a udpv6_recvmsg
perf record -e probe:__udp_enqueue_schedule_skb -e probe:udp_recvmsg -e probe:udpv6_recvmsg -ag wget google.com
perf report --stdio

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: DNS (?) not working on G5 (64-bit powerpc) (was [net-next,v3,3/3] udp: try to avoid 2 cache miss on dequeue)
  2017-06-22 16:43       ` Paolo Abeni
@ 2017-06-22 20:27         ` Paolo Abeni
  -1 siblings, 0 replies; 29+ messages in thread
From: Paolo Abeni @ 2017-06-22 20:27 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: David S. Miller, Eric Dumazet, netdev, linuxppc-dev

On Thu, 2017-06-22 at 18:43 +0200, Paolo Abeni wrote:
> On Thu, 2017-06-22 at 23:06 +1000, Michael Ellerman wrote:
> > Paolo wrote:
> > > when udp_recvmsg() is executed, on x86_64 and other archs, most skb
> > > fields are on cold cachelines.
> > > If the skb are linear and the kernel don't need to compute the udp
> > > csum, only a handful of skb fields are required by udp_recvmsg().
> > > Since we already use skb->dev_scratch to cache hot data, and
> > > there are 32 bits unused on 64 bit archs, use such field to cache
> > > as much data as we can, and try to prefetch on dequeue the relevant
> > > fields that are left out.
> > > 
> > > This can save up to 2 cache miss per packet.
> > > 
> > > v1 -> v2:
> > >   - changed udp_dev_scratch fields types to u{32,16} variant,
> > >     replaced bitfiled with bool
> > > 
> > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > Acked-by: Eric Dumazet <edumazet@google.com>
> > > ---
> > >  net/ipv4/udp.c | 114 +++++++++++++++++++++++++++++++++++++++++++++++++++------
> > >  1 file changed, 103 insertions(+), 11 deletions(-)
> > 
> > This appears to break wget on one of my machines.
> > 
> > Networking in general is working, I'm able to SSH in, but then I can't
> > do a wget.
> > 
> > eg:
> > 
> > $ wget google.com
> > --2017-06-22 22:45:39--  http://google.com/
> > Resolving proxy.pmdw.com (proxy.pmdw.com)... failed: Temporary failure in name resolution.
> > wget: unable to resolve host address ‘proxy.pmdw.com’
> > 
> > $ host proxy.pmdw.com
> > proxy.pmdw.com is an alias for raven.pmdw.com.
> > raven.pmdw.com has address 10.1.2.3
> > 
> > $ wget google.com
> > --2017-06-22 22:52:08--  http://google.com/
> > Resolving proxy.pmdw.com (proxy.pmdw.com)... failed: Temporary failure in name resolution.
> > wget: unable to resolve host address ‘proxy.pmdw.com’
> > 
> > Maybe host is using TCP but the man page says it doesn't?
> > 
> > 
> > Everything is OK if I boot back to the previous commit 0a463c78d25b
> > ("udp: avoid a cache miss on dequeue"):
> > 
> > $ wget google.com
> > --2017-06-22 23:00:01--  http://google.com/
> > Resolving proxy.pmdw.com (proxy.pmdw.com)... 10.1.2.3
> > Connecting to proxy.pmdw.com (proxy.pmdw.com)|10.1.2.3|:3128... connected.
> > Proxy request sent, awaiting response... 302 Found
> > Location: http://www.google.com.au/?gfe_rd=cr&ei=Ub9LWbPbLujDXrH1uPgE [following]
> > --2017-06-22 23:00:01--  http://www.google.com.au/?gfe_rd=cr&ei=Ub9LWbPbLujDXrH1uPgE
> > Reusing existing connection to proxy.pmdw.com:3128.
> > Proxy request sent, awaiting response... 200 OK
> > Length: unspecified [text/html]
> > Saving to: ‘index.html’
> > 
> > index.html                      [ <=>                                       ]  11.37K  --.-KB/s    in 0.001s  
> > 
> > 2017-06-22 23:00:01 (22.0 MB/s) - ‘index.html’ saved [11640]
> > 
> > $ uname -a
> > Linux 4.12.0-rc4-gcc6-00988-g0a463c7 #88 SMP Thu Jun 22 22:55:12 AEST 2017 ppc64 GNU/Linux
> > 
> > 
> > Haven't had time to debug any further. Any ideas?
> 
> Thank you for this report.
> 
> Can you please specify features of the relevant NIC ? (ethtool -k
> <name>) 
> 
> I'll try to replicate the issue as soon I'll get hands on suitable HW,

I had my hands on power7, but I can't trivially reproduce the issue so
I'm going to bug you for more info. 

Can you please specify the host CPU, the NIC in use (ethtool -i
<name>), the compiler version used to build the kernel and possibly
provide a tcpdump of the DNS packets received/sent while running wget
and while running the host command?

Do you have the relevant kernel running on others PPC hosts?

Thank you,

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: DNS (?) not working on G5 (64-bit powerpc) (was [net-next,v3,3/3] udp: try to avoid 2 cache miss on dequeue)
@ 2017-06-22 20:27         ` Paolo Abeni
  0 siblings, 0 replies; 29+ messages in thread
From: Paolo Abeni @ 2017-06-22 20:27 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: David S. Miller, Eric Dumazet, netdev, linuxppc-dev

On Thu, 2017-06-22 at 18:43 +0200, Paolo Abeni wrote:
> On Thu, 2017-06-22 at 23:06 +1000, Michael Ellerman wrote:
> > Paolo wrote:
> > > when udp_recvmsg() is executed, on x86_64 and other archs, most skb
> > > fields are on cold cachelines.
> > > If the skb are linear and the kernel don't need to compute the udp
> > > csum, only a handful of skb fields are required by udp_recvmsg().
> > > Since we already use skb->dev_scratch to cache hot data, and
> > > there are 32 bits unused on 64 bit archs, use such field to cache
> > > as much data as we can, and try to prefetch on dequeue the relevant
> > > fields that are left out.
> > > 
> > > This can save up to 2 cache miss per packet.
> > > 
> > > v1 -> v2:
> > >   - changed udp_dev_scratch fields types to u{32,16} variant,
> > >     replaced bitfiled with bool
> > > 
> > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > Acked-by: Eric Dumazet <edumazet@google.com>
> > > ---
> > >  net/ipv4/udp.c | 114 +++++++++++++++++++++++++++++++++++++++++++++++++++------
> > >  1 file changed, 103 insertions(+), 11 deletions(-)
> > 
> > This appears to break wget on one of my machines.
> > 
> > Networking in general is working, I'm able to SSH in, but then I can't
> > do a wget.
> > 
> > eg:
> > 
> > $ wget google.com
> > --2017-06-22 22:45:39--  http://google.com/
> > Resolving proxy.pmdw.com (proxy.pmdw.com)... failed: Temporary failure in name resolution.
> > wget: unable to resolve host address ‘proxy.pmdw.com’
> > 
> > $ host proxy.pmdw.com
> > proxy.pmdw.com is an alias for raven.pmdw.com.
> > raven.pmdw.com has address 10.1.2.3
> > 
> > $ wget google.com
> > --2017-06-22 22:52:08--  http://google.com/
> > Resolving proxy.pmdw.com (proxy.pmdw.com)... failed: Temporary failure in name resolution.
> > wget: unable to resolve host address ‘proxy.pmdw.com’
> > 
> > Maybe host is using TCP but the man page says it doesn't?
> > 
> > 
> > Everything is OK if I boot back to the previous commit 0a463c78d25b
> > ("udp: avoid a cache miss on dequeue"):
> > 
> > $ wget google.com
> > --2017-06-22 23:00:01--  http://google.com/
> > Resolving proxy.pmdw.com (proxy.pmdw.com)... 10.1.2.3
> > Connecting to proxy.pmdw.com (proxy.pmdw.com)|10.1.2.3|:3128... connected.
> > Proxy request sent, awaiting response... 302 Found
> > Location: http://www.google.com.au/?gfe_rd=cr&ei=Ub9LWbPbLujDXrH1uPgE [following]
> > --2017-06-22 23:00:01--  http://www.google.com.au/?gfe_rd=cr&ei=Ub9LWbPbLujDXrH1uPgE
> > Reusing existing connection to proxy.pmdw.com:3128.
> > Proxy request sent, awaiting response... 200 OK
> > Length: unspecified [text/html]
> > Saving to: ‘index.html’
> > 
> > index.html                      [ <=>                                       ]  11.37K  --.-KB/s    in 0.001s  
> > 
> > 2017-06-22 23:00:01 (22.0 MB/s) - ‘index.html’ saved [11640]
> > 
> > $ uname -a
> > Linux 4.12.0-rc4-gcc6-00988-g0a463c7 #88 SMP Thu Jun 22 22:55:12 AEST 2017 ppc64 GNU/Linux
> > 
> > 
> > Haven't had time to debug any further. Any ideas?
> 
> Thank you for this report.
> 
> Can you please specify features of the relevant NIC ? (ethtool -k
> <name>) 
> 
> I'll try to replicate the issue as soon I'll get hands on suitable HW,

I had my hands on power7, but I can't trivially reproduce the issue so
I'm going to bug you for more info. 

Can you please specify the host CPU, the NIC in use (ethtool -i
<name>), the compiler version used to build the kernel and possibly
provide a tcpdump of the DNS packets received/sent while running wget
and while running the host command?

Do you have the relevant kernel running on others PPC hosts?

Thank you,

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: DNS (?) not working on G5 (64-bit powerpc) (was [net-next,v3,3/3] udp: try to avoid 2 cache miss on dequeue)
  2017-06-22 16:43       ` Paolo Abeni
@ 2017-06-22 20:28         ` Paolo Abeni
  -1 siblings, 0 replies; 29+ messages in thread
From: Paolo Abeni @ 2017-06-22 20:28 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: David S. Miller, Eric Dumazet, netdev, linuxppc-dev

On Thu, 2017-06-22 at 18:43 +0200, Paolo Abeni wrote:
> On Thu, 2017-06-22 at 23:06 +1000, Michael Ellerman wrote:
> > Paolo wrote:
> > > when udp_recvmsg() is executed, on x86_64 and other archs, most skb
> > > fields are on cold cachelines.
> > > If the skb are linear and the kernel don't need to compute the udp
> > > csum, only a handful of skb fields are required by udp_recvmsg().
> > > Since we already use skb->dev_scratch to cache hot data, and
> > > there are 32 bits unused on 64 bit archs, use such field to cache
> > > as much data as we can, and try to prefetch on dequeue the relevant
> > > fields that are left out.
> > > 
> > > This can save up to 2 cache miss per packet.
> > > 
> > > v1 -> v2:
> > >   - changed udp_dev_scratch fields types to u{32,16} variant,
> > >     replaced bitfiled with bool
> > > 
> > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > Acked-by: Eric Dumazet <edumazet@google.com>
> > > ---
> > >  net/ipv4/udp.c | 114 +++++++++++++++++++++++++++++++++++++++++++++++++++------
> > >  1 file changed, 103 insertions(+), 11 deletions(-)
> > 
> > This appears to break wget on one of my machines.
> > 
> > Networking in general is working, I'm able to SSH in, but then I can't
> > do a wget.
> > 
> > eg:
> > 
> > $ wget google.com
> > --2017-06-22 22:45:39--  http://google.com/
> > Resolving proxy.pmdw.com (proxy.pmdw.com)... failed: Temporary failure in name resolution.
> > wget: unable to resolve host address ‘proxy.pmdw.com’
> > 
> > $ host proxy.pmdw.com
> > proxy.pmdw.com is an alias for raven.pmdw.com.
> > raven.pmdw.com has address 10.1.2.3
> > 
> > $ wget google.com
> > --2017-06-22 22:52:08--  http://google.com/
> > Resolving proxy.pmdw.com (proxy.pmdw.com)... failed: Temporary failure in name resolution.
> > wget: unable to resolve host address ‘proxy.pmdw.com’
> > 
> > Maybe host is using TCP but the man page says it doesn't?
> > 
> > 
> > Everything is OK if I boot back to the previous commit 0a463c78d25b
> > ("udp: avoid a cache miss on dequeue"):
> > 
> > $ wget google.com
> > --2017-06-22 23:00:01--  http://google.com/
> > Resolving proxy.pmdw.com (proxy.pmdw.com)... 10.1.2.3
> > Connecting to proxy.pmdw.com (proxy.pmdw.com)|10.1.2.3|:3128... connected.
> > Proxy request sent, awaiting response... 302 Found
> > Location: http://www.google.com.au/?gfe_rd=cr&ei=Ub9LWbPbLujDXrH1uPgE [following]
> > --2017-06-22 23:00:01--  http://www.google.com.au/?gfe_rd=cr&ei=Ub9LWbPbLujDXrH1uPgE
> > Reusing existing connection to proxy.pmdw.com:3128.
> > Proxy request sent, awaiting response... 200 OK
> > Length: unspecified [text/html]
> > Saving to: ‘index.html’
> > 
> > index.html                      [ <=>                                       ]  11.37K  --.-KB/s    in 0.001s  
> > 
> > 2017-06-22 23:00:01 (22.0 MB/s) - ‘index.html’ saved [11640]
> > 
> > $ uname -a
> > Linux 4.12.0-rc4-gcc6-00988-g0a463c7 #88 SMP Thu Jun 22 22:55:12 AEST 2017 ppc64 GNU/Linux
> > 
> > 
> > Haven't had time to debug any further. Any ideas?
> 
> Thank you for this report.
> 
> Can you please specify features of the relevant NIC ? (ethtool -k
> <name>) 
> 
> I'll try to replicate the issue as soon I'll get hands on suitable HW,

I had my hands on power7, but I can't trivially reproduce the issue so
I'm going to bug you for more info. 

Can you please specify the host CPU, the NIC in use (ethtool -i
<name>), the compiler version used to build the kernel and possibly
provide a tcpdump of the DNS packets received/sent while running wget
and while running the host command?

Do you have the relevant kernel running on others PPC hosts?

Thank you,

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: DNS (?) not working on G5 (64-bit powerpc) (was [net-next,v3,3/3] udp: try to avoid 2 cache miss on dequeue)
@ 2017-06-22 20:28         ` Paolo Abeni
  0 siblings, 0 replies; 29+ messages in thread
From: Paolo Abeni @ 2017-06-22 20:28 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: David S. Miller, Eric Dumazet, netdev, linuxppc-dev

On Thu, 2017-06-22 at 18:43 +0200, Paolo Abeni wrote:
> On Thu, 2017-06-22 at 23:06 +1000, Michael Ellerman wrote:
> > Paolo wrote:
> > > when udp_recvmsg() is executed, on x86_64 and other archs, most skb
> > > fields are on cold cachelines.
> > > If the skb are linear and the kernel don't need to compute the udp
> > > csum, only a handful of skb fields are required by udp_recvmsg().
> > > Since we already use skb->dev_scratch to cache hot data, and
> > > there are 32 bits unused on 64 bit archs, use such field to cache
> > > as much data as we can, and try to prefetch on dequeue the relevant
> > > fields that are left out.
> > > 
> > > This can save up to 2 cache miss per packet.
> > > 
> > > v1 -> v2:
> > >   - changed udp_dev_scratch fields types to u{32,16} variant,
> > >     replaced bitfiled with bool
> > > 
> > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > Acked-by: Eric Dumazet <edumazet@google.com>
> > > ---
> > >  net/ipv4/udp.c | 114 +++++++++++++++++++++++++++++++++++++++++++++++++++------
> > >  1 file changed, 103 insertions(+), 11 deletions(-)
> > 
> > This appears to break wget on one of my machines.
> > 
> > Networking in general is working, I'm able to SSH in, but then I can't
> > do a wget.
> > 
> > eg:
> > 
> > $ wget google.com
> > --2017-06-22 22:45:39--  http://google.com/
> > Resolving proxy.pmdw.com (proxy.pmdw.com)... failed: Temporary failure in name resolution.
> > wget: unable to resolve host address ‘proxy.pmdw.com’
> > 
> > $ host proxy.pmdw.com
> > proxy.pmdw.com is an alias for raven.pmdw.com.
> > raven.pmdw.com has address 10.1.2.3
> > 
> > $ wget google.com
> > --2017-06-22 22:52:08--  http://google.com/
> > Resolving proxy.pmdw.com (proxy.pmdw.com)... failed: Temporary failure in name resolution.
> > wget: unable to resolve host address ‘proxy.pmdw.com’
> > 
> > Maybe host is using TCP but the man page says it doesn't?
> > 
> > 
> > Everything is OK if I boot back to the previous commit 0a463c78d25b
> > ("udp: avoid a cache miss on dequeue"):
> > 
> > $ wget google.com
> > --2017-06-22 23:00:01--  http://google.com/
> > Resolving proxy.pmdw.com (proxy.pmdw.com)... 10.1.2.3
> > Connecting to proxy.pmdw.com (proxy.pmdw.com)|10.1.2.3|:3128... connected.
> > Proxy request sent, awaiting response... 302 Found
> > Location: http://www.google.com.au/?gfe_rd=cr&ei=Ub9LWbPbLujDXrH1uPgE [following]
> > --2017-06-22 23:00:01--  http://www.google.com.au/?gfe_rd=cr&ei=Ub9LWbPbLujDXrH1uPgE
> > Reusing existing connection to proxy.pmdw.com:3128.
> > Proxy request sent, awaiting response... 200 OK
> > Length: unspecified [text/html]
> > Saving to: ‘index.html’
> > 
> > index.html                      [ <=>                                       ]  11.37K  --.-KB/s    in 0.001s  
> > 
> > 2017-06-22 23:00:01 (22.0 MB/s) - ‘index.html’ saved [11640]
> > 
> > $ uname -a
> > Linux 4.12.0-rc4-gcc6-00988-g0a463c7 #88 SMP Thu Jun 22 22:55:12 AEST 2017 ppc64 GNU/Linux
> > 
> > 
> > Haven't had time to debug any further. Any ideas?
> 
> Thank you for this report.
> 
> Can you please specify features of the relevant NIC ? (ethtool -k
> <name>) 
> 
> I'll try to replicate the issue as soon I'll get hands on suitable HW,

I had my hands on power7, but I can't trivially reproduce the issue so
I'm going to bug you for more info. 

Can you please specify the host CPU, the NIC in use (ethtool -i
<name>), the compiler version used to build the kernel and possibly
provide a tcpdump of the DNS packets received/sent while running wget
and while running the host command?

Do you have the relevant kernel running on others PPC hosts?

Thank you,

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: DNS (?) not working on G5 (64-bit powerpc) (was [net-next,v3,3/3] udp: try to avoid 2 cache miss on dequeue)
  2017-06-22 13:06     ` Michael Ellerman
@ 2017-06-22 20:57       ` Paolo Abeni
  -1 siblings, 0 replies; 29+ messages in thread
From: Paolo Abeni @ 2017-06-22 20:57 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: David S. Miller, Eric Dumazet, netdev, linuxppc-dev

On Thu, 2017-06-22 at 23:06 +1000, Michael Ellerman wrote:
> Paolo wrote:
> > when udp_recvmsg() is executed, on x86_64 and other archs, most skb
> > fields are on cold cachelines.
> > If the skb are linear and the kernel don't need to compute the udp
> > csum, only a handful of skb fields are required by udp_recvmsg().
> > Since we already use skb->dev_scratch to cache hot data, and
> > there are 32 bits unused on 64 bit archs, use such field to cache
> > as much data as we can, and try to prefetch on dequeue the relevant
> > fields that are left out.
> > 
> > This can save up to 2 cache miss per packet.
> > 
> > v1 -> v2:
> >   - changed udp_dev_scratch fields types to u{32,16} variant,
> >     replaced bitfiled with bool
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > Acked-by: Eric Dumazet <edumazet@google.com>
> > ---
> >  net/ipv4/udp.c | 114 +++++++++++++++++++++++++++++++++++++++++++++++++++------
> >  1 file changed, 103 insertions(+), 11 deletions(-)
> 
> This appears to break wget on one of my machines.
> 
> Networking in general is working, I'm able to SSH in, but then I can't
> do a wget.

Can you please check if the following patch fixes the issue? Only
compiled tested here.

Thanks!!!
---
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 067a607..80d89fe 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1446,16 +1446,19 @@ static struct sk_buff *__first_packet_length(struct sock *sk,
 {
 	struct sk_buff *skb;
 
-	while ((skb = skb_peek(rcvq)) != NULL &&
-	       udp_lib_checksum_complete(skb)) {
-		__UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS,
-				IS_UDPLITE(sk));
-		__UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS,
-				IS_UDPLITE(sk));
-		atomic_inc(&sk->sk_drops);
-		__skb_unlink(skb, rcvq);
-		*total += skb->truesize;
-		kfree_skb(skb);
+	while ((skb = skb_peek(rcvq)) != NULL) {
+		if (udp_lib_checksum_complete(skb)) {
+			__UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS,
+					IS_UDPLITE(sk));
+			__UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS,
+					IS_UDPLITE(sk));
+			atomic_inc(&sk->sk_drops);
+			__skb_unlink(skb, rcvq);
+			*total += skb->truesize;
+			kfree_skb(skb);
+		} else {
+			udp_set_dev_scratch(skb);
+		}
 	}
 	return skb;
 }

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: DNS (?) not working on G5 (64-bit powerpc) (was [net-next,v3,3/3] udp: try to avoid 2 cache miss on dequeue)
@ 2017-06-22 20:57       ` Paolo Abeni
  0 siblings, 0 replies; 29+ messages in thread
From: Paolo Abeni @ 2017-06-22 20:57 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: David S. Miller, Eric Dumazet, netdev, linuxppc-dev

On Thu, 2017-06-22 at 23:06 +1000, Michael Ellerman wrote:
> Paolo wrote:
> > when udp_recvmsg() is executed, on x86_64 and other archs, most skb
> > fields are on cold cachelines.
> > If the skb are linear and the kernel don't need to compute the udp
> > csum, only a handful of skb fields are required by udp_recvmsg().
> > Since we already use skb->dev_scratch to cache hot data, and
> > there are 32 bits unused on 64 bit archs, use such field to cache
> > as much data as we can, and try to prefetch on dequeue the relevant
> > fields that are left out.
> > 
> > This can save up to 2 cache miss per packet.
> > 
> > v1 -> v2:
> >   - changed udp_dev_scratch fields types to u{32,16} variant,
> >     replaced bitfiled with bool
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > Acked-by: Eric Dumazet <edumazet@google.com>
> > ---
> >  net/ipv4/udp.c | 114 +++++++++++++++++++++++++++++++++++++++++++++++++++------
> >  1 file changed, 103 insertions(+), 11 deletions(-)
> 
> This appears to break wget on one of my machines.
> 
> Networking in general is working, I'm able to SSH in, but then I can't
> do a wget.

Can you please check if the following patch fixes the issue? Only
compiled tested here.

Thanks!!!
---
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 067a607..80d89fe 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1446,16 +1446,19 @@ static struct sk_buff *__first_packet_length(struct sock *sk,
 {
 	struct sk_buff *skb;
 
-	while ((skb = skb_peek(rcvq)) != NULL &&
-	       udp_lib_checksum_complete(skb)) {
-		__UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS,
-				IS_UDPLITE(sk));
-		__UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS,
-				IS_UDPLITE(sk));
-		atomic_inc(&sk->sk_drops);
-		__skb_unlink(skb, rcvq);
-		*total += skb->truesize;
-		kfree_skb(skb);
+	while ((skb = skb_peek(rcvq)) != NULL) {
+		if (udp_lib_checksum_complete(skb)) {
+			__UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS,
+					IS_UDPLITE(sk));
+			__UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS,
+					IS_UDPLITE(sk));
+			atomic_inc(&sk->sk_drops);
+			__skb_unlink(skb, rcvq);
+			*total += skb->truesize;
+			kfree_skb(skb);
+		} else {
+			udp_set_dev_scratch(skb);
+		}
 	}
 	return skb;
 }

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: DNS (?) not working on G5 (64-bit powerpc) (was [net-next,v3,3/3] udp: try to avoid 2 cache miss on dequeue)
  2017-06-22 20:57       ` Paolo Abeni
@ 2017-06-22 21:18         ` Hannes Frederic Sowa
  -1 siblings, 0 replies; 29+ messages in thread
From: Hannes Frederic Sowa @ 2017-06-22 21:18 UTC (permalink / raw)
  To: Paolo Abeni, Michael Ellerman
  Cc: David S. Miller, Eric Dumazet, netdev, linuxppc-dev

On Thu, Jun 22, 2017, at 22:57, Paolo Abeni wrote:
> 
> Can you please check if the following patch fixes the issue? Only
> compiled tested here.
> 
> Thanks!!!
> ---
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 067a607..80d89fe 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1446,16 +1446,19 @@ static struct sk_buff
> *__first_packet_length(struct sock *sk,
>  {
>  	struct sk_buff *skb;
>  
> -       while ((skb = skb_peek(rcvq)) != NULL &&
> -              udp_lib_checksum_complete(skb)) {
> -               __UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS,
> -                               IS_UDPLITE(sk));
> -               __UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS,
> -                               IS_UDPLITE(sk));
> -               atomic_inc(&sk->sk_drops);
> -               __skb_unlink(skb, rcvq);
> -               *total += skb->truesize;
> -               kfree_skb(skb);
> +       while ((skb = skb_peek(rcvq)) != NULL) {
> +               if (udp_lib_checksum_complete(skb)) {
> +                       __UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS,
> +                                       IS_UDPLITE(sk));
> +                       __UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS,
> +                                       IS_UDPLITE(sk));
> +                       atomic_inc(&sk->sk_drops);
> +                       __skb_unlink(skb, rcvq);
> +                       *total += skb->truesize;
> +                       kfree_skb(skb);
> +               } else {
> +                       udp_set_dev_scratch(skb);

It needs a "break;" here.

> +               }
>  	}
>  	return skb;
>  }

Bye,
Hannes

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: DNS (?) not working on G5 (64-bit powerpc) (was [net-next, v3, 3/3] udp: try to avoid 2 cache miss on dequeue)
@ 2017-06-22 21:18         ` Hannes Frederic Sowa
  0 siblings, 0 replies; 29+ messages in thread
From: Hannes Frederic Sowa @ 2017-06-22 21:18 UTC (permalink / raw)
  To: Paolo Abeni, Michael Ellerman
  Cc: David S. Miller, Eric Dumazet, netdev, linuxppc-dev

On Thu, Jun 22, 2017, at 22:57, Paolo Abeni wrote:
>=20
> Can you please check if the following patch fixes the issue? Only
> compiled tested here.
>=20
> Thanks!!!
> ---
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 067a607..80d89fe 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1446,16 +1446,19 @@ static struct sk_buff
> *__first_packet_length(struct sock *sk,
> =C2=A0{
> =C2=A0	struct sk_buff *skb;
> =C2=A0
> -       while ((skb =3D skb_peek(rcvq)) !=3D NULL &&
> -       =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0udp_lib_checksum_comple=
te(skb)) {
> -               __UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS,
> -                               IS_UDPLITE(sk));
> -               __UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS,
> -                               IS_UDPLITE(sk));
> -               atomic_inc(&sk->sk_drops);
> -               __skb_unlink(skb, rcvq);
> -               *total +=3D skb->truesize;
> -               kfree_skb(skb);
> +       while ((skb =3D skb_peek(rcvq)) !=3D NULL) {
> +               if (udp_lib_checksum_complete(skb)) {
> +                       __UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS,
> +                                       IS_UDPLITE(sk));
> +                       __UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS,
> +                                       IS_UDPLITE(sk));
> +                       atomic_inc(&sk->sk_drops);
> +                       __skb_unlink(skb, rcvq);
> +                       *total +=3D skb->truesize;
> +                       kfree_skb(skb);
> +               } else {
> +                       udp_set_dev_scratch(skb);

It needs a "break;" here.

> +               }
> =C2=A0	}
> =C2=A0	return skb;
> =C2=A0}

Bye,
Hannes

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: DNS (?) not working on G5 (64-bit powerpc) (was [net-next,v3,3/3] udp: try to avoid 2 cache miss on dequeue)
  2017-06-22 21:18         ` DNS (?) not working on G5 (64-bit powerpc) (was [net-next, v3, 3/3] " Hannes Frederic Sowa
@ 2017-06-23  6:59           ` Michael Ellerman
  -1 siblings, 0 replies; 29+ messages in thread
From: Michael Ellerman @ 2017-06-23  6:59 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Paolo Abeni
  Cc: David S. Miller, Eric Dumazet, netdev, linuxppc-dev

Hannes Frederic Sowa <hannes@stressinduktion.org> writes:

> On Thu, Jun 22, 2017, at 22:57, Paolo Abeni wrote:
>> 
>> Can you please check if the following patch fixes the issue? Only
>> compiled tested here.
>> 
>> Thanks!!!
>> ---
>> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
>> index 067a607..80d89fe 100644
>> --- a/net/ipv4/udp.c
>> +++ b/net/ipv4/udp.c
>> @@ -1446,16 +1446,19 @@ static struct sk_buff
>> *__first_packet_length(struct sock *sk,
>>  {
>>  	struct sk_buff *skb;
>>  
>> -       while ((skb = skb_peek(rcvq)) != NULL &&
>> -              udp_lib_checksum_complete(skb)) {
>> -               __UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS,
>> -                               IS_UDPLITE(sk));
>> -               __UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS,
>> -                               IS_UDPLITE(sk));
>> -               atomic_inc(&sk->sk_drops);
>> -               __skb_unlink(skb, rcvq);
>> -               *total += skb->truesize;
>> -               kfree_skb(skb);
>> +       while ((skb = skb_peek(rcvq)) != NULL) {
>> +               if (udp_lib_checksum_complete(skb)) {
>> +                       __UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS,
>> +                                       IS_UDPLITE(sk));
>> +                       __UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS,
>> +                                       IS_UDPLITE(sk));
>> +                       atomic_inc(&sk->sk_drops);
>> +                       __skb_unlink(skb, rcvq);
>> +                       *total += skb->truesize;
>> +                       kfree_skb(skb);
>> +               } else {
>> +                       udp_set_dev_scratch(skb);
>
> It needs a "break;" here.
>
>> +               }
>>  	}
>>  	return skb;
>>  }

That works!

$ wget google.com
--2017-06-23 16:56:31--  http://google.com/
Resolving proxy.pmdw.com (proxy.pmdw.com)... 10.1.2.3
Connecting to proxy.pmdw.com (proxy.pmdw.com)|10.1.2.3|:3128... connected.
Proxy request sent, awaiting response... 302 Found
Location: http://www.google.com.au/?gfe_rd=cr&ei=n7tMWeb9JYPr8wfg4LXYAQ [following]
--2017-06-23 16:56:31--  http://www.google.com.au/?gfe_rd=cr&ei=n7tMWeb9JYPr8wfg4LXYAQ
Reusing existing connection to proxy.pmdw.com:3128.
Proxy request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘index.html’


The patch had whitespace issues or something and I had to apply it by
hand, here's what I actually tested.

cheers

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 067a607917f9..d3227c1bbe8e 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1446,16 +1446,20 @@ static struct sk_buff *__first_packet_length(struct sock *sk,
 {
 	struct sk_buff *skb;
 
-	while ((skb = skb_peek(rcvq)) != NULL &&
-	       udp_lib_checksum_complete(skb)) {
-		__UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS,
-				IS_UDPLITE(sk));
-		__UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS,
-				IS_UDPLITE(sk));
-		atomic_inc(&sk->sk_drops);
-		__skb_unlink(skb, rcvq);
-		*total += skb->truesize;
-		kfree_skb(skb);
+	while ((skb = skb_peek(rcvq)) != NULL) {
+		if (udp_lib_checksum_complete(skb)) {
+			__UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS,
+					IS_UDPLITE(sk));
+			__UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS,
+					IS_UDPLITE(sk));
+			atomic_inc(&sk->sk_drops);
+			__skb_unlink(skb, rcvq);
+			*total += skb->truesize;
+			kfree_skb(skb);
+		} else {
+			udp_set_dev_scratch(skb);
+			break;
+		}
 	}
 	return skb;
 }

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: DNS (?) not working on G5 (64-bit powerpc) (was [net-next, v3, 3/3] udp: try to avoid 2 cache miss on dequeue)
@ 2017-06-23  6:59           ` Michael Ellerman
  0 siblings, 0 replies; 29+ messages in thread
From: Michael Ellerman @ 2017-06-23  6:59 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Paolo Abeni
  Cc: David S. Miller, Eric Dumazet, netdev, linuxppc-dev

Hannes Frederic Sowa <hannes@stressinduktion.org> writes:

> On Thu, Jun 22, 2017, at 22:57, Paolo Abeni wrote:
>>=20
>> Can you please check if the following patch fixes the issue? Only
>> compiled tested here.
>>=20
>> Thanks!!!
>> ---
>> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
>> index 067a607..80d89fe 100644
>> --- a/net/ipv4/udp.c
>> +++ b/net/ipv4/udp.c
>> @@ -1446,16 +1446,19 @@ static struct sk_buff
>> *__first_packet_length(struct sock *sk,
>> =C2=A0{
>> =C2=A0	struct sk_buff *skb;
>> =C2=A0
>> -       while ((skb =3D skb_peek(rcvq)) !=3D NULL &&
>> -       =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0udp_lib_checksum_compl=
ete(skb)) {
>> -               __UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS,
>> -                               IS_UDPLITE(sk));
>> -               __UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS,
>> -                               IS_UDPLITE(sk));
>> -               atomic_inc(&sk->sk_drops);
>> -               __skb_unlink(skb, rcvq);
>> -               *total +=3D skb->truesize;
>> -               kfree_skb(skb);
>> +       while ((skb =3D skb_peek(rcvq)) !=3D NULL) {
>> +               if (udp_lib_checksum_complete(skb)) {
>> +                       __UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS,
>> +                                       IS_UDPLITE(sk));
>> +                       __UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS,
>> +                                       IS_UDPLITE(sk));
>> +                       atomic_inc(&sk->sk_drops);
>> +                       __skb_unlink(skb, rcvq);
>> +                       *total +=3D skb->truesize;
>> +                       kfree_skb(skb);
>> +               } else {
>> +                       udp_set_dev_scratch(skb);
>
> It needs a "break;" here.
>
>> +               }
>> =C2=A0	}
>> =C2=A0	return skb;
>> =C2=A0}

That works!

$ wget google.com
--2017-06-23 16:56:31--  http://google.com/
Resolving proxy.pmdw.com (proxy.pmdw.com)... 10.1.2.3
Connecting to proxy.pmdw.com (proxy.pmdw.com)|10.1.2.3|:3128... connected.
Proxy request sent, awaiting response... 302 Found
Location: http://www.google.com.au/?gfe_rd=3Dcr&ei=3Dn7tMWeb9JYPr8wfg4LXYAQ=
 [following]
--2017-06-23 16:56:31--  http://www.google.com.au/?gfe_rd=3Dcr&ei=3Dn7tMWeb=
9JYPr8wfg4LXYAQ
Reusing existing connection to proxy.pmdw.com:3128.
Proxy request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: =E2=80=98index.html=E2=80=99


The patch had whitespace issues or something and I had to apply it by
hand, here's what I actually tested.

cheers

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 067a607917f9..d3227c1bbe8e 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1446,16 +1446,20 @@ static struct sk_buff *__first_packet_length(struct=
 sock *sk,
 {
 	struct sk_buff *skb;
=20
-	while ((skb =3D skb_peek(rcvq)) !=3D NULL &&
-	       udp_lib_checksum_complete(skb)) {
-		__UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS,
-				IS_UDPLITE(sk));
-		__UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS,
-				IS_UDPLITE(sk));
-		atomic_inc(&sk->sk_drops);
-		__skb_unlink(skb, rcvq);
-		*total +=3D skb->truesize;
-		kfree_skb(skb);
+	while ((skb =3D skb_peek(rcvq)) !=3D NULL) {
+		if (udp_lib_checksum_complete(skb)) {
+			__UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS,
+					IS_UDPLITE(sk));
+			__UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS,
+					IS_UDPLITE(sk));
+			atomic_inc(&sk->sk_drops);
+			__skb_unlink(skb, rcvq);
+			*total +=3D skb->truesize;
+			kfree_skb(skb);
+		} else {
+			udp_set_dev_scratch(skb);
+			break;
+		}
 	}
 	return skb;
 }

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: DNS (?) not working on G5 (64-bit powerpc) (was [net-next,v3,3/3] udp: try to avoid 2 cache miss on dequeue)
  2017-06-23  6:59           ` DNS (?) not working on G5 (64-bit powerpc) (was [net-next, v3, 3/3] " Michael Ellerman
@ 2017-06-23 11:59             ` Paolo Abeni
  -1 siblings, 0 replies; 29+ messages in thread
From: Paolo Abeni @ 2017-06-23 11:59 UTC (permalink / raw)
  To: Michael Ellerman, Hannes Frederic Sowa
  Cc: David S. Miller, Eric Dumazet, netdev, linuxppc-dev

On Fri, 2017-06-23 at 16:59 +1000, Michael Ellerman wrote:
> Hannes Frederic Sowa <hannes@stressinduktion.org> writes:
> 
> > On Thu, Jun 22, 2017, at 22:57, Paolo Abeni wrote:
> > > 
> > > Can you please check if the following patch fixes the issue? Only
> > > compiled tested here.
> > > 
> > > Thanks!!!
> > > ---
> > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > > index 067a607..80d89fe 100644
> > > --- a/net/ipv4/udp.c
> > > +++ b/net/ipv4/udp.c
> > > @@ -1446,16 +1446,19 @@ static struct sk_buff
> > > *__first_packet_length(struct sock *sk,
> > >  {
> > >  	struct sk_buff *skb;
> > >  
> > > -       while ((skb = skb_peek(rcvq)) != NULL &&
> > > -              udp_lib_checksum_complete(skb)) {
> > > -               __UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS,
> > > -                               IS_UDPLITE(sk));
> > > -               __UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS,
> > > -                               IS_UDPLITE(sk));
> > > -               atomic_inc(&sk->sk_drops);
> > > -               __skb_unlink(skb, rcvq);
> > > -               *total += skb->truesize;
> > > -               kfree_skb(skb);
> > > +       while ((skb = skb_peek(rcvq)) != NULL) {
> > > +               if (udp_lib_checksum_complete(skb)) {
> > > +                       __UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS,
> > > +                                       IS_UDPLITE(sk));
> > > +                       __UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS,
> > > +                                       IS_UDPLITE(sk));
> > > +                       atomic_inc(&sk->sk_drops);
> > > +                       __skb_unlink(skb, rcvq);
> > > +                       *total += skb->truesize;
> > > +                       kfree_skb(skb);
> > > +               } else {
> > > +                       udp_set_dev_scratch(skb);
> > 
> > It needs a "break;" here.
> > 
> > > +               }
> > >  	}
> > >  	return skb;
> > >  }
> 
> That works!
> 
> $ wget google.com
> --2017-06-23 16:56:31--  http://google.com/
> Resolving proxy.pmdw.com (proxy.pmdw.com)... 10.1.2.3
> Connecting to proxy.pmdw.com (proxy.pmdw.com)|10.1.2.3|:3128... connected.
> Proxy request sent, awaiting response... 302 Found
> Location: http://www.google.com.au/?gfe_rd=cr&ei=n7tMWeb9JYPr8wfg4LXYAQ [following]
> --2017-06-23 16:56:31--  http://www.google.com.au/?gfe_rd=cr&ei=n7tMWeb9JYPr8wfg4LXYAQ
> Reusing existing connection to proxy.pmdw.com:3128.
> Proxy request sent, awaiting response... 200 OK
> Length: unspecified [text/html]
> Saving to: ‘index.html’
> 
> 
> The patch had whitespace issues or something and I had to apply it by
> hand, here's what I actually tested.

Thank you!

I'll submit formally the patch after some more testing.

I noticed this version has entered the ppc patchwork, but I think that
the formal submission should go towards the net-next tree.

Cheers,

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: DNS (?) not working on G5 (64-bit powerpc) (was [net-next,v3,3/3] udp: try to avoid 2 cache miss on dequeue)
@ 2017-06-23 11:59             ` Paolo Abeni
  0 siblings, 0 replies; 29+ messages in thread
From: Paolo Abeni @ 2017-06-23 11:59 UTC (permalink / raw)
  To: Michael Ellerman, Hannes Frederic Sowa
  Cc: David S. Miller, Eric Dumazet, netdev, linuxppc-dev

On Fri, 2017-06-23 at 16:59 +1000, Michael Ellerman wrote:
> Hannes Frederic Sowa <hannes@stressinduktion.org> writes:
> 
> > On Thu, Jun 22, 2017, at 22:57, Paolo Abeni wrote:
> > > 
> > > Can you please check if the following patch fixes the issue? Only
> > > compiled tested here.
> > > 
> > > Thanks!!!
> > > ---
> > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > > index 067a607..80d89fe 100644
> > > --- a/net/ipv4/udp.c
> > > +++ b/net/ipv4/udp.c
> > > @@ -1446,16 +1446,19 @@ static struct sk_buff
> > > *__first_packet_length(struct sock *sk,
> > >  {
> > >  	struct sk_buff *skb;
> > >  
> > > -       while ((skb = skb_peek(rcvq)) != NULL &&
> > > -              udp_lib_checksum_complete(skb)) {
> > > -               __UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS,
> > > -                               IS_UDPLITE(sk));
> > > -               __UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS,
> > > -                               IS_UDPLITE(sk));
> > > -               atomic_inc(&sk->sk_drops);
> > > -               __skb_unlink(skb, rcvq);
> > > -               *total += skb->truesize;
> > > -               kfree_skb(skb);
> > > +       while ((skb = skb_peek(rcvq)) != NULL) {
> > > +               if (udp_lib_checksum_complete(skb)) {
> > > +                       __UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS,
> > > +                                       IS_UDPLITE(sk));
> > > +                       __UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS,
> > > +                                       IS_UDPLITE(sk));
> > > +                       atomic_inc(&sk->sk_drops);
> > > +                       __skb_unlink(skb, rcvq);
> > > +                       *total += skb->truesize;
> > > +                       kfree_skb(skb);
> > > +               } else {
> > > +                       udp_set_dev_scratch(skb);
> > 
> > It needs a "break;" here.
> > 
> > > +               }
> > >  	}
> > >  	return skb;
> > >  }
> 
> That works!
> 
> $ wget google.com
> --2017-06-23 16:56:31--  http://google.com/
> Resolving proxy.pmdw.com (proxy.pmdw.com)... 10.1.2.3
> Connecting to proxy.pmdw.com (proxy.pmdw.com)|10.1.2.3|:3128... connected.
> Proxy request sent, awaiting response... 302 Found
> Location: http://www.google.com.au/?gfe_rd=cr&ei=n7tMWeb9JYPr8wfg4LXYAQ [following]
> --2017-06-23 16:56:31--  http://www.google.com.au/?gfe_rd=cr&ei=n7tMWeb9JYPr8wfg4LXYAQ
> Reusing existing connection to proxy.pmdw.com:3128.
> Proxy request sent, awaiting response... 200 OK
> Length: unspecified [text/html]
> Saving to: ‘index.html’
> 
> 
> The patch had whitespace issues or something and I had to apply it by
> hand, here's what I actually tested.

Thank you!

I'll submit formally the patch after some more testing.

I noticed this version has entered the ppc patchwork, but I think that
the formal submission should go towards the net-next tree.

Cheers,

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: DNS (?) not working on G5 (64-bit powerpc) (was [net-next,v3,3/3] udp: try to avoid 2 cache miss on dequeue)
  2017-06-23 11:59             ` Paolo Abeni
@ 2017-06-26  3:15               ` Michael Ellerman
  -1 siblings, 0 replies; 29+ messages in thread
From: Michael Ellerman @ 2017-06-26  3:15 UTC (permalink / raw)
  To: Paolo Abeni, Hannes Frederic Sowa
  Cc: David S. Miller, Eric Dumazet, netdev, linuxppc-dev

Paolo Abeni <pabeni@redhat.com> writes:
> Thank you!
>
> I'll submit formally the patch after some more testing.

Thanks.

> I noticed this version has entered the ppc patchwork, but I think that
> the formal submission should go towards the net-next tree.

Yeah it picks up all patches sent to the list. That's fine I'll just
mark it "Not applicable", and expect to see it arrive via net-next.

cheers

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: DNS (?) not working on G5 (64-bit powerpc) (was [net-next, v3, 3/3] udp: try to avoid 2 cache miss on dequeue)
@ 2017-06-26  3:15               ` Michael Ellerman
  0 siblings, 0 replies; 29+ messages in thread
From: Michael Ellerman @ 2017-06-26  3:15 UTC (permalink / raw)
  To: Paolo Abeni, Hannes Frederic Sowa
  Cc: David S. Miller, Eric Dumazet, netdev, linuxppc-dev

Paolo Abeni <pabeni@redhat.com> writes:
> Thank you!
>
> I'll submit formally the patch after some more testing.

Thanks.

> I noticed this version has entered the ppc patchwork, but I think that
> the formal submission should go towards the net-next tree.

Yeah it picks up all patches sent to the list. That's fine I'll just
mark it "Not applicable", and expect to see it arrive via net-next.

cheers

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net-next v3 2/3] udp: avoid a cache miss on dequeue
  2017-06-12  9:23 ` [PATCH net-next v3 2/3] udp: avoid a cache miss on dequeue Paolo Abeni
@ 2017-07-17  7:04   ` Eric Dumazet
  2017-07-17 20:59     ` [PATCH net] udp: preserve skb->dst if required for IP options processing Paolo Abeni
  0 siblings, 1 reply; 29+ messages in thread
From: Eric Dumazet @ 2017-07-17  7:04 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: netdev, David S. Miller, Eric Dumazet

On Mon, 2017-06-12 at 11:23 +0200, Paolo Abeni wrote:
> Since UDP no more uses sk->destructor, we can clear completely
> the skb head state before enqueuing. Amend and use
> skb_release_head_state() for that.
> 
> All head states share a single cacheline, which is not
> normally used/accesses on dequeue. We can avoid entirely accessing
> such cacheline implementing and using in the UDP code a specialized
> skb free helper which ignores the skb head state.
> 
> This saves a cacheline miss at skb deallocation time.
> 
> v1 -> v2:
>   replaced secpath_reset() with skb_release_head_state()
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> Acked-by: Eric Dumazet <edumazet@google.com>
> ---
>  include/linux/skbuff.h |  2 ++
>  net/core/skbuff.c      | 24 ++++++++++++++++++++----
>  net/ipv4/udp.c         |  6 +++++-
>  3 files changed, 27 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index decce36..d66d4fe 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -880,10 +880,12 @@ static inline bool skb_unref(struct sk_buff *skb)
>  	return true;
>  }
>  
> +void skb_release_head_state(struct sk_buff *skb);
>  void kfree_skb(struct sk_buff *skb);
>  void kfree_skb_list(struct sk_buff *segs);
>  void skb_tx_error(struct sk_buff *skb);
>  void consume_skb(struct sk_buff *skb);
> +void consume_stateless_skb(struct sk_buff *skb);
>  void  __kfree_skb(struct sk_buff *skb);
>  extern struct kmem_cache *skbuff_head_cache;
>  
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 747263c..3046027 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -643,12 +643,10 @@ static void kfree_skbmem(struct sk_buff *skb)
>  	kmem_cache_free(skbuff_fclone_cache, fclones);
>  }
>  
> -static void skb_release_head_state(struct sk_buff *skb)
> +void skb_release_head_state(struct sk_buff *skb)
>  {
>  	skb_dst_drop(skb);
> -#ifdef CONFIG_XFRM
> -	secpath_put(skb->sp);
> -#endif
> +	secpath_reset(skb);
>  	if (skb->destructor) {
>  		WARN_ON(in_irq());
>  		skb->destructor(skb);
> @@ -751,6 +749,24 @@ void consume_skb(struct sk_buff *skb)
>  }
>  EXPORT_SYMBOL(consume_skb);
>  
> +/**
> + *	consume_stateless_skb - free an skbuff, assuming it is stateless
> + *	@skb: buffer to free
> + *
> + *	Works like consume_skb(), but this variant assumes that all the head
> + *	states have been already dropped.
> + */
> +void consume_stateless_skb(struct sk_buff *skb)
> +{
> +	if (!skb_unref(skb))
> +		return;
> +
> +	trace_consume_skb(skb);
> +	if (likely(skb->head))
> +		skb_release_data(skb);
> +	kfree_skbmem(skb);
> +}
> +
>  void __kfree_skb_flush(void)
>  {
>  	struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index fdcb743..d8b265f 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1359,7 +1359,8 @@ void skb_consume_udp(struct sock *sk, struct sk_buff *skb, int len)
>  		sk_peek_offset_bwd(sk, len);
>  		unlock_sock_fast(sk, slow);
>  	}
> -	consume_skb(skb);
> +
> +	consume_stateless_skb(skb);
>  }
>  EXPORT_SYMBOL_GPL(skb_consume_udp);
>  
> @@ -1739,6 +1740,9 @@ static int __udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
>  		sk_mark_napi_id_once(sk, skb);
>  	}
>  
> +	/* clear all pending head states while they are hot in the cache */
> +	skb_release_head_state(skb);
> +
>  	rc = __udp_enqueue_schedule_skb(sk, skb);
>  	if (rc < 0) {
>  		int is_udplite = IS_UDPLITE(sk);


Oh well, we forgot that we need to access IP header when/if
__ip_options_echo() needs it.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH net] udp: preserve skb->dst if required for IP options processing
  2017-07-17  7:04   ` Eric Dumazet
@ 2017-07-17 20:59     ` Paolo Abeni
  2017-07-17 21:26       ` Eric Dumazet
  2017-07-17 21:30       ` Eric Dumazet
  0 siblings, 2 replies; 29+ messages in thread
From: Paolo Abeni @ 2017-07-17 20:59 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Eric Dumazet, Hannes Frederic Sowa

Eric noticed that in udp_recvmsg() we still need to access
skb->dst while processing the IP options.
Since commit 0a463c78d25b ("udp: avoid a cache miss on dequeue")
skb->dst is no more available at recvmsg() time and bad things
will happen if we enter the relevant code path.

This commit address the issue, avoid clearing skb->dst if
any IP options are present into the relevant skb.
Since the IP CB is contained in the first skb cacheline, we can
test it to decide to leverage the consume_stateless_skb()
optimization, without measurable additional cost in the faster
path.

Fixes: b65ac44674dd ("udp: try to avoid 2 cache miss on dequeue")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/ipv4/udp.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 25294d4..b057653 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1388,6 +1388,11 @@ void skb_consume_udp(struct sock *sk, struct sk_buff *skb, int len)
 		unlock_sock_fast(sk, slow);
 	}
 
+	/* we cleared the head states previously only if the skb lacks any IP
+	 * options, see __udp_queue_rcv_skb().
+	 */
+	if (unlikely(IPCB(skb)->opt.optlen > 0))
+		skb_release_head_state(skb);
 	consume_stateless_skb(skb);
 }
 EXPORT_SYMBOL_GPL(skb_consume_udp);
@@ -1779,8 +1784,12 @@ static int __udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 		sk_mark_napi_id_once(sk, skb);
 	}
 
-	/* clear all pending head states while they are hot in the cache */
-	skb_release_head_state(skb);
+	/* At recvmsg() time we need skb->dst to process IP options-related
+	 * cmsg, elsewhere can we clear all pending head states while they are
+	 * hot in the cache
+	 */
+	if (likely(IPCB(skb)->opt.optlen == 0))
+		skb_release_head_state(skb);
 
 	rc = __udp_enqueue_schedule_skb(sk, skb);
 	if (rc < 0) {
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH net] udp: preserve skb->dst if required for IP options processing
  2017-07-17 20:59     ` [PATCH net] udp: preserve skb->dst if required for IP options processing Paolo Abeni
@ 2017-07-17 21:26       ` Eric Dumazet
  2017-07-18  7:50         ` Paolo Abeni
  2017-07-17 21:30       ` Eric Dumazet
  1 sibling, 1 reply; 29+ messages in thread
From: Eric Dumazet @ 2017-07-17 21:26 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: netdev, David S. Miller, Eric Dumazet, Hannes Frederic Sowa

On Mon, 2017-07-17 at 22:59 +0200, Paolo Abeni wrote:
> Eric noticed that in udp_recvmsg() we still need to access
> skb->dst while processing the IP options.
> Since commit 0a463c78d25b ("udp: avoid a cache miss on dequeue")
> skb->dst is no more available at recvmsg() time and bad things
> will happen if we enter the relevant code path.
> 
> This commit address the issue, avoid clearing skb->dst if
> any IP options are present into the relevant skb.
> Since the IP CB is contained in the first skb cacheline, we can
> test it to decide to leverage the consume_stateless_skb()
> optimization, without measurable additional cost in the faster
> path.
> 
> Fixes: b65ac44674dd ("udp: try to avoid 2 cache miss on dequeue")

Strange.... I thought bug was coming from
commit 0a463c78d25b ("udp: avoid a cache miss on dequeue")

Please also add original reporter, from syzkaller team.

Reported-by: Andrey Konovalov <andreyknvl@google.com>

> Reported-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  net/ipv4/udp.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 25294d4..b057653 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1388,6 +1388,11 @@ void skb_consume_udp(struct sock *sk, struct sk_buff *skb, int len)
>  		unlock_sock_fast(sk, slow);
>  	}
>  
> +	/* we cleared the head states previously only if the skb lacks any IP
> +	 * options, see __udp_queue_rcv_skb().
> +	 */
> +	if (unlikely(IPCB(skb)->opt.optlen > 0))
> +		skb_release_head_state(skb);
>  	consume_stateless_skb(skb);
>  }
>  EXPORT_SYMBOL_GPL(skb_consume_udp);
> @@ -1779,8 +1784,12 @@ static int __udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
>  		sk_mark_napi_id_once(sk, skb);
>  	}
>  
> -	/* clear all pending head states while they are hot in the cache */
> -	skb_release_head_state(skb);
> +	/* At recvmsg() time we need skb->dst to process IP options-related
> +	 * cmsg, elsewhere can we clear all pending head states while they are
> +	 * hot in the cache
> +	 */
> +	if (likely(IPCB(skb)->opt.optlen == 0))
> +		skb_release_head_state(skb);
>  
>  	rc = __udp_enqueue_schedule_skb(sk, skb);
>  	if (rc < 0) {

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net] udp: preserve skb->dst if required for IP options processing
  2017-07-17 20:59     ` [PATCH net] udp: preserve skb->dst if required for IP options processing Paolo Abeni
  2017-07-17 21:26       ` Eric Dumazet
@ 2017-07-17 21:30       ` Eric Dumazet
  2017-07-17 21:35         ` Eric Dumazet
  1 sibling, 1 reply; 29+ messages in thread
From: Eric Dumazet @ 2017-07-17 21:30 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: netdev, David S. Miller, Eric Dumazet, Hannes Frederic Sowa

On Mon, 2017-07-17 at 22:59 +0200, Paolo Abeni wrote:
> Eric noticed that in udp_recvmsg() we still need to access
> skb->dst while processing the IP options.
> Since commit 0a463c78d25b ("udp: avoid a cache miss on dequeue")
> skb->dst is no more available at recvmsg() time and bad things
> will happen if we enter the relevant code path.
> 
> This commit address the issue, avoid clearing skb->dst if
> any IP options are present into the relevant skb.
> Since the IP CB is contained in the first skb cacheline, we can
> test it to decide to leverage the consume_stateless_skb()
> optimization, without measurable additional cost in the faster
> path.

Also the problem was not skb->dst at all, but fact that 
skb->head can not be freed if IP options need later to be fetched.

( __ip_options_echo() -> 
   sptr = skb_network_header(skb);

... -> use after free

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net] udp: preserve skb->dst if required for IP options processing
  2017-07-17 21:30       ` Eric Dumazet
@ 2017-07-17 21:35         ` Eric Dumazet
  0 siblings, 0 replies; 29+ messages in thread
From: Eric Dumazet @ 2017-07-17 21:35 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: netdev, David S. Miller, Eric Dumazet, Hannes Frederic Sowa

On Mon, 2017-07-17 at 14:30 -0700, Eric Dumazet wrote:
> On Mon, 2017-07-17 at 22:59 +0200, Paolo Abeni wrote:
> > Eric noticed that in udp_recvmsg() we still need to access
> > skb->dst while processing the IP options.
> > Since commit 0a463c78d25b ("udp: avoid a cache miss on dequeue")
> > skb->dst is no more available at recvmsg() time and bad things
> > will happen if we enter the relevant code path.
> > 
> > This commit address the issue, avoid clearing skb->dst if
> > any IP options are present into the relevant skb.
> > Since the IP CB is contained in the first skb cacheline, we can
> > test it to decide to leverage the consume_stateless_skb()
> > optimization, without measurable additional cost in the faster
> > path.
> 
> Also the problem was not skb->dst at all, but fact that 
> skb->head can not be freed if IP options need later to be fetched.
> 
> ( __ip_options_echo() -> 
>    sptr = skb_network_header(skb);
> 
> ... -> use after free


Oh well, I read again the whole thing, and it seems you are right.

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH net] udp: preserve skb->dst if required for IP options processing
  2017-07-17 21:26       ` Eric Dumazet
@ 2017-07-18  7:50         ` Paolo Abeni
  0 siblings, 0 replies; 29+ messages in thread
From: Paolo Abeni @ 2017-07-18  7:50 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, David S. Miller, Eric Dumazet, Hannes Frederic Sowa

On Mon, 2017-07-17 at 14:26 -0700, Eric Dumazet wrote:
> On Mon, 2017-07-17 at 22:59 +0200, Paolo Abeni wrote:
> > Eric noticed that in udp_recvmsg() we still need to access
> > skb->dst while processing the IP options.
> > Since commit 0a463c78d25b ("udp: avoid a cache miss on dequeue")
> > skb->dst is no more available at recvmsg() time and bad things
> > will happen if we enter the relevant code path.
> > 
> > This commit address the issue, avoid clearing skb->dst if
> > any IP options are present into the relevant skb.
> > Since the IP CB is contained in the first skb cacheline, we can
> > test it to decide to leverage the consume_stateless_skb()
> > optimization, without measurable additional cost in the faster
> > path.
> > 
> > Fixes: b65ac44674dd ("udp: try to avoid 2 cache miss on dequeue")
> 
> Strange.... I thought bug was coming from
> commit 0a463c78d25b ("udp: avoid a cache miss on dequeue")

You are right - and I was wrong with this tag.
> 
> Please also add original reporter, from syzkaller team.
> 
> Reported-by: Andrey Konovalov <andreyknvl@google.com>

I'll resubmit soon a v2 with the update tags. I'm not sure if I must or
must not retain your ack ?

BTW I think that __ip_options_echo() also needs to access skb->dev, via
fib_compute_spec_dst(), and perhaps another patch is needed. I'll look
at that today.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2017-07-18  7:50 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-12  9:23 [PATCH net-next v3 0/3] udp: reduce cache pressure Paolo Abeni
2017-06-12  9:23 ` [PATCH net-next v3 1/3] net: factor out a helper to decrement the skb refcount Paolo Abeni
2017-06-12  9:23 ` [PATCH net-next v3 2/3] udp: avoid a cache miss on dequeue Paolo Abeni
2017-07-17  7:04   ` Eric Dumazet
2017-07-17 20:59     ` [PATCH net] udp: preserve skb->dst if required for IP options processing Paolo Abeni
2017-07-17 21:26       ` Eric Dumazet
2017-07-18  7:50         ` Paolo Abeni
2017-07-17 21:30       ` Eric Dumazet
2017-07-17 21:35         ` Eric Dumazet
2017-06-12  9:23 ` [PATCH net-next v3 3/3] udp: try to avoid 2 cache miss on dequeue Paolo Abeni
2017-06-22 13:06   ` DNS (?) not working on G5 (64-bit powerpc) (was [net-next, v3, 3/3] udp: try to avoid 2 cache miss on dequeue) Michael Ellerman
2017-06-22 13:06     ` Michael Ellerman
2017-06-22 16:43     ` DNS (?) not working on G5 (64-bit powerpc) (was [net-next,v3,3/3] " Paolo Abeni
2017-06-22 16:43       ` Paolo Abeni
2017-06-22 20:27       ` Paolo Abeni
2017-06-22 20:27         ` Paolo Abeni
2017-06-22 20:28       ` Paolo Abeni
2017-06-22 20:28         ` Paolo Abeni
2017-06-22 20:57     ` Paolo Abeni
2017-06-22 20:57       ` Paolo Abeni
2017-06-22 21:18       ` Hannes Frederic Sowa
2017-06-22 21:18         ` DNS (?) not working on G5 (64-bit powerpc) (was [net-next, v3, 3/3] " Hannes Frederic Sowa
2017-06-23  6:59         ` DNS (?) not working on G5 (64-bit powerpc) (was [net-next,v3,3/3] " Michael Ellerman
2017-06-23  6:59           ` DNS (?) not working on G5 (64-bit powerpc) (was [net-next, v3, 3/3] " Michael Ellerman
2017-06-23 11:59           ` DNS (?) not working on G5 (64-bit powerpc) (was [net-next,v3,3/3] " Paolo Abeni
2017-06-23 11:59             ` Paolo Abeni
2017-06-26  3:15             ` Michael Ellerman
2017-06-26  3:15               ` DNS (?) not working on G5 (64-bit powerpc) (was [net-next, v3, 3/3] " Michael Ellerman
2017-06-12 14:01 ` [PATCH net-next v3 0/3] udp: reduce cache pressure David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.