netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net 0/3] udp: behave under memory pressure
@ 2020-01-17 17:27 Paolo Abeni
  2020-01-17 17:27 ` [PATCH net 1/3] net: generic enter_memory_pressure implementation Paolo Abeni
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Paolo Abeni @ 2020-01-17 17:27 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Willem de Bruijn

Williem reported that in some scenarios the UDP protocol can keep a lot of
memory in use on an idle system. He also diagnosed the root cause in the
forward allocated memory bulk free.

This series addresses the issue adding memory pressure tracking for the UDP
protocol, and flushing the fwd allocated memory if the protocol is under
memory pressure.

The first two patches clean-up the current memory pressure helpers for UDP
usage, and the 3rd one is the actual fix.

Targeting the net tree, as this addresses a reported issue. I guess even
net-next can be considered a valid target, as this also changes slightly the
protocol behavior under memory pressure. Please advise on the preferred option.

Paolo Abeni (3):
  net: generic enter_memory_pressure implementation.
  net: add annotation to memory_pressure lockless access
  udp: avoid bulk memory scheduling on memory pressure.

 include/net/sock.h |  4 ++--
 include/net/udp.h  |  2 ++
 net/core/sock.c    | 10 +++++++---
 net/ipv4/udp.c     | 13 ++++++++++++-
 net/ipv6/udp.c     |  2 ++
 5 files changed, 25 insertions(+), 6 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH net 1/3] net: generic enter_memory_pressure implementation.
  2020-01-17 17:27 [PATCH net 0/3] udp: behave under memory pressure Paolo Abeni
@ 2020-01-17 17:27 ` Paolo Abeni
  2020-01-17 17:27 ` [PATCH net 2/3] net: add annotations to memory_pressure lockless access Paolo Abeni
  2020-01-17 17:27 ` [PATCH net 3/3] udp: avoid bulk memory scheduling on memory pressure Paolo Abeni
  2 siblings, 0 replies; 7+ messages in thread
From: Paolo Abeni @ 2020-01-17 17:27 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Willem de Bruijn

Currently sk_leave_memory_pressure() offers a generic implementation
for protocol lacking the leave_memory_pressure() helper, but
supporting the memory pressure model.

sk_enter_memory_pressure() lacks such bits. As a result we get code
duplication and additional, avoidable indirect calls.

This change provides a generic implementation for entering memory pressure,
similar to the existing sk_leave_memory_pressure().

Note: no existing protocol is affected, until the existing
enter_memory_pressure helper is removed from the relevant 'proto'
struct.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/core/sock.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 8459ad579f73..8cf24dca9bde 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2323,10 +2323,14 @@ EXPORT_SYMBOL(sock_cmsg_send);
 
 static void sk_enter_memory_pressure(struct sock *sk)
 {
-	if (!sk->sk_prot->enter_memory_pressure)
-		return;
+	if (sk->sk_prot->enter_memory_pressure) {
+		sk->sk_prot->enter_memory_pressure(sk);
+	} else {
+		unsigned long *memory_pressure = sk->sk_prot->memory_pressure;
 
-	sk->sk_prot->enter_memory_pressure(sk);
+		if (memory_pressure && !READ_ONCE(*memory_pressure))
+			WRITE_ONCE(*memory_pressure, 1);
+	}
 }
 
 static void sk_leave_memory_pressure(struct sock *sk)
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net 2/3] net: add annotations to memory_pressure lockless access
  2020-01-17 17:27 [PATCH net 0/3] udp: behave under memory pressure Paolo Abeni
  2020-01-17 17:27 ` [PATCH net 1/3] net: generic enter_memory_pressure implementation Paolo Abeni
@ 2020-01-17 17:27 ` Paolo Abeni
  2020-01-17 17:27 ` [PATCH net 3/3] udp: avoid bulk memory scheduling on memory pressure Paolo Abeni
  2 siblings, 0 replies; 7+ messages in thread
From: Paolo Abeni @ 2020-01-17 17:27 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Willem de Bruijn

The proto memory pressure status is updated without any related lock
held. This patch adds annotations to document this fact and avoid
future syzbot complains.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/net/sock.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 8dff68b4c316..08383624b8cb 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1264,7 +1264,7 @@ static inline bool sk_under_memory_pressure(const struct sock *sk)
 	    mem_cgroup_under_socket_pressure(sk->sk_memcg))
 		return true;
 
-	return !!*sk->sk_prot->memory_pressure;
+	return !!READ_ONCE(*sk->sk_prot->memory_pressure);
 }
 
 static inline long
@@ -1318,7 +1318,7 @@ proto_memory_pressure(struct proto *prot)
 {
 	if (!prot->memory_pressure)
 		return false;
-	return !!*prot->memory_pressure;
+	return !!READ_ONCE(*prot->memory_pressure);
 }
 
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net 3/3] udp: avoid bulk memory scheduling on memory pressure.
  2020-01-17 17:27 [PATCH net 0/3] udp: behave under memory pressure Paolo Abeni
  2020-01-17 17:27 ` [PATCH net 1/3] net: generic enter_memory_pressure implementation Paolo Abeni
  2020-01-17 17:27 ` [PATCH net 2/3] net: add annotations to memory_pressure lockless access Paolo Abeni
@ 2020-01-17 17:27 ` Paolo Abeni
  2020-01-17 17:51   ` Eric Dumazet
  2 siblings, 1 reply; 7+ messages in thread
From: Paolo Abeni @ 2020-01-17 17:27 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Willem de Bruijn

Williem reported that after commit 0d4a6608f68c ("udp: do rmem bulk
free even if the rx sk queue is empty") the memory allocated by
an almost idle system with many UDP sockets can grow a lot.

This change addresses the issue enabling memory pressure tracking
for UDP and flushing the fwd allocated memory on dequeue if the
UDP protocol is under memory pressure.

Note that with this patch applied, the system allocates more
liberally memory for UDP sockets while the total memory usage is
below udp_mem[1], while the vanilla kernel would allow at most a
single page per socket when UDP memory usage goes above udp_mem[0]
- see __sk_mem_raise_allocated().

Reported-and-diagnosed-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Fixes: commit 0d4a6608f68c ("udp: do rmem bulk free even if the rx sk queue is empty")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/net/udp.h |  2 ++
 net/ipv4/udp.c    | 13 ++++++++++++-
 net/ipv6/udp.c    |  2 ++
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/net/udp.h b/include/net/udp.h
index bad74f780831..cff730798291 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -94,6 +94,8 @@ static inline struct udp_hslot *udp_hashslot2(struct udp_table *table,
 extern struct proto udp_prot;
 
 extern atomic_long_t udp_memory_allocated;
+extern unsigned long udp_memory_pressure;
+extern struct percpu_counter udp_sockets_allocated;
 
 /* sysctl variables for udp */
 extern long sysctl_udp_mem[3];
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 93a355b6b092..3a68ec6c3410 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -119,6 +119,12 @@ EXPORT_SYMBOL(udp_table);
 long sysctl_udp_mem[3] __read_mostly;
 EXPORT_SYMBOL(sysctl_udp_mem);
 
+unsigned long udp_memory_pressure __read_mostly;
+EXPORT_SYMBOL(udp_memory_pressure);
+
+struct percpu_counter udp_sockets_allocated;
+EXPORT_SYMBOL(udp_sockets_allocated);
+
 atomic_long_t udp_memory_allocated;
 EXPORT_SYMBOL(udp_memory_allocated);
 
@@ -1368,7 +1374,8 @@ static void udp_rmem_release(struct sock *sk, int size, int partial,
 	if (likely(partial)) {
 		up->forward_deficit += size;
 		size = up->forward_deficit;
-		if (size < (sk->sk_rcvbuf >> 2))
+		if (size < (sk->sk_rcvbuf >> 2) &&
+		    !READ_ONCE(udp_memory_pressure))
 			return;
 	} else {
 		size += up->forward_deficit;
@@ -2789,7 +2796,9 @@ struct proto udp_prot = {
 	.unhash			= udp_lib_unhash,
 	.rehash			= udp_v4_rehash,
 	.get_port		= udp_v4_get_port,
+	.memory_pressure	= &udp_memory_pressure,
 	.memory_allocated	= &udp_memory_allocated,
+	.sockets_allocated	= &udp_sockets_allocated,
 	.sysctl_mem		= sysctl_udp_mem,
 	.sysctl_wmem_offset	= offsetof(struct net, ipv4.sysctl_udp_wmem_min),
 	.sysctl_rmem_offset	= offsetof(struct net, ipv4.sysctl_udp_rmem_min),
@@ -3062,6 +3071,8 @@ void __init udp_init(void)
 	sysctl_udp_mem[1] = limit;
 	sysctl_udp_mem[2] = sysctl_udp_mem[0] * 2;
 
+	percpu_counter_init(&udp_sockets_allocated, 0, GFP_KERNEL);
+
 	__udp_sysctl_init(&init_net);
 
 	/* 16 spinlocks per cpu */
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 9fec580c968e..b29d92574ccc 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1670,7 +1670,9 @@ struct proto udpv6_prot = {
 	.unhash			= udp_lib_unhash,
 	.rehash			= udp_v6_rehash,
 	.get_port		= udp_v6_get_port,
+	.memory_pressure	= &udp_memory_pressure,
 	.memory_allocated	= &udp_memory_allocated,
+	.sockets_allocated	= &udp_sockets_allocated,
 	.sysctl_mem		= sysctl_udp_mem,
 	.sysctl_wmem_offset     = offsetof(struct net, ipv4.sysctl_udp_wmem_min),
 	.sysctl_rmem_offset     = offsetof(struct net, ipv4.sysctl_udp_rmem_min),
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net 3/3] udp: avoid bulk memory scheduling on memory pressure.
  2020-01-17 17:27 ` [PATCH net 3/3] udp: avoid bulk memory scheduling on memory pressure Paolo Abeni
@ 2020-01-17 17:51   ` Eric Dumazet
  2020-01-17 18:37     ` Paolo Abeni
  2020-01-17 18:38     ` Willem de Bruijn
  0 siblings, 2 replies; 7+ messages in thread
From: Eric Dumazet @ 2020-01-17 17:51 UTC (permalink / raw)
  To: Paolo Abeni, netdev; +Cc: David S. Miller, Willem de Bruijn



On 1/17/20 9:27 AM, Paolo Abeni wrote:
> Williem reported that after commit 0d4a6608f68c ("udp: do rmem bulk
> free even if the rx sk queue is empty") the memory allocated by
> an almost idle system with many UDP sockets can grow a lot.
> 
> This change addresses the issue enabling memory pressure tracking
> for UDP and flushing the fwd allocated memory on dequeue if the
> UDP protocol is under memory pressure.
> 
> Note that with this patch applied, the system allocates more
> liberally memory for UDP sockets while the total memory usage is
> below udp_mem[1], while the vanilla kernel would allow at most a
> single page per socket when UDP memory usage goes above udp_mem[0]
> - see __sk_mem_raise_allocated().
> 
> Reported-and-diagnosed-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> Fixes: commit 0d4a6608f68c ("udp: do rmem bulk free even if the rx sk queue is empty")

Not a proper Fixes: tag

Frankly I would rather revert this patch, unless you show how much things were improved.

Where in the UDP code the forward allocations will be released while udp_memory_pressure
is hit ?

TCP has many calls to sk_mem_reclaim() and sk_mem_reclaim_partial() to try
to gracefully exit memory pressure.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net 3/3] udp: avoid bulk memory scheduling on memory pressure.
  2020-01-17 17:51   ` Eric Dumazet
@ 2020-01-17 18:37     ` Paolo Abeni
  2020-01-17 18:38     ` Willem de Bruijn
  1 sibling, 0 replies; 7+ messages in thread
From: Paolo Abeni @ 2020-01-17 18:37 UTC (permalink / raw)
  To: Eric Dumazet, netdev; +Cc: David S. Miller, Willem de Bruijn

Hi,

On Fri, 2020-01-17 at 09:51 -0800, Eric Dumazet wrote:
> On 1/17/20 9:27 AM, Paolo Abeni wrote:
> > Williem reported that after commit 0d4a6608f68c ("udp: do rmem bulk
> > free even if the rx sk queue is empty") the memory allocated by
> > an almost idle system with many UDP sockets can grow a lot.
> > 
> > This change addresses the issue enabling memory pressure tracking
> > for UDP and flushing the fwd allocated memory on dequeue if the
> > UDP protocol is under memory pressure.
> > 
> > Note that with this patch applied, the system allocates more
> > liberally memory for UDP sockets while the total memory usage is
> > below udp_mem[1], while the vanilla kernel would allow at most a
> > single page per socket when UDP memory usage goes above udp_mem[0]
> > - see __sk_mem_raise_allocated().
> > 
> > Reported-and-diagnosed-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> > Fixes: commit 0d4a6608f68c ("udp: do rmem bulk free even if the rx sk queue is empty")

Thank you for the feedback.

> Not a proper Fixes: tag
> 
> Frankly I would rather revert this patch, unless you show how much things were improved.

unpatched version:

# ensure we will hit memory pressure
echo "5000 10000 20000" > /proc/sys/net/ipv4/udp_mem

# run the repro from Willem
./repro.py

# it get stuck after opening a bunch of sockets, because
# __udp_enqueue_schedule_skb() hits the memory limit
# and packets are dropped.

patched kernel:
echo "5000 10000 20000" > /proc/sys/net/ipv4/udp_mem
./repro.py
# completes successfully, output trimmed
sockets: used 10179
TCP: inuse 4 orphan 0 tw 0 alloc 7 mem 1
UDP: inuse 4 mem 19860
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0

To complete successfully with an unpatched kernel the reproducer
requires memory limits 1 or 2 order of magnitude greaters.

> Where in the UDP code the forward allocations will be released while udp_memory_pressure
> is hit ?

fwd memory is released by udp_rmem_release(), so every time some
process tries to read from any UDP socket. Surely less effective than
the TCP infrastructure, but that other option looks overkill for UDP?!?

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net 3/3] udp: avoid bulk memory scheduling on memory pressure.
  2020-01-17 17:51   ` Eric Dumazet
  2020-01-17 18:37     ` Paolo Abeni
@ 2020-01-17 18:38     ` Willem de Bruijn
  1 sibling, 0 replies; 7+ messages in thread
From: Willem de Bruijn @ 2020-01-17 18:38 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Paolo Abeni, Network Development, David S. Miller

On Fri, Jan 17, 2020 at 12:51 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 1/17/20 9:27 AM, Paolo Abeni wrote:
> > Williem reported that after commit 0d4a6608f68c ("udp: do rmem bulk
> > free even if the rx sk queue is empty") the memory allocated by
> > an almost idle system with many UDP sockets can grow a lot.
> >
> > This change addresses the issue enabling memory pressure tracking
> > for UDP and flushing the fwd allocated memory on dequeue if the
> > UDP protocol is under memory pressure.
> >
> > Note that with this patch applied, the system allocates more
> > liberally memory for UDP sockets while the total memory usage is
> > below udp_mem[1], while the vanilla kernel would allow at most a
> > single page per socket when UDP memory usage goes above udp_mem[0]
> > - see __sk_mem_raise_allocated().
> >
> > Reported-and-diagnosed-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> > Fixes: commit 0d4a6608f68c ("udp: do rmem bulk free even if the rx sk queue is empty")

Thanks a lot for the quick follow-up, Paolo!

> Not a proper Fixes: tag

And to give credit where it's due: Eric diagnosed the issue.

> Frankly I would rather revert this patch, unless you show how much things were improved.

In response to your question in the cover letter, I also think that
for net the three-line revert patch is more obviously correct.

Memory pressure might be helpful in net-next with some iteration, of course.

> Where in the UDP code the forward allocations will be released while udp_memory_pressure
> is hit ?
>
> TCP has many calls to sk_mem_reclaim() and sk_mem_reclaim_partial() to try
> to gracefully exit memory pressure.
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-01-17 18:39 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-17 17:27 [PATCH net 0/3] udp: behave under memory pressure Paolo Abeni
2020-01-17 17:27 ` [PATCH net 1/3] net: generic enter_memory_pressure implementation Paolo Abeni
2020-01-17 17:27 ` [PATCH net 2/3] net: add annotations to memory_pressure lockless access Paolo Abeni
2020-01-17 17:27 ` [PATCH net 3/3] udp: avoid bulk memory scheduling on memory pressure Paolo Abeni
2020-01-17 17:51   ` Eric Dumazet
2020-01-17 18:37     ` Paolo Abeni
2020-01-17 18:38     ` Willem de Bruijn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).