linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] sock: Improve condition on sockmem pressure
@ 2023-05-22  7:01 Abel Wu
  2023-05-22  7:01 ` [PATCH v2 1/4] sock: Always take memcg pressure into consideration Abel Wu
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Abel Wu @ 2023-05-22  7:01 UTC (permalink / raw)
  To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Glauber Costa, netdev, linux-kernel, Abel Wu

Currently the memcg's status is also accounted into the socket's
memory pressure to alleviate the memcg's memstall. But there are
still cases that can be improved. Please check the patches for
detailed info.

v2:
  - Splited into several patches and modified commit log for
    better readability.
  - Make memcg's pressure consideration function-wide in
    __sk_mem_raise_allocated().

v1: https://lore.kernel.org/netdev/20230506085903.96133-1-wuyun.abel@bytedance.com/

Abel Wu (4):
  sock: Always take memcg pressure into consideration
  sock: Fix misuse of sk_under_memory_pressure()
  sock: Consider memcg pressure when raising sockmem
  sock: Remove redundant cond of memcg pressure

 include/net/sock.h | 11 +++++++----
 net/core/sock.c    | 32 +++++++++++++++++++++++---------
 2 files changed, 30 insertions(+), 13 deletions(-)

-- 
2.37.3


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 1/4] sock: Always take memcg pressure into consideration
  2023-05-22  7:01 [PATCH v2 0/4] sock: Improve condition on sockmem pressure Abel Wu
@ 2023-05-22  7:01 ` Abel Wu
  2023-05-22  7:01 ` [PATCH v2 2/4] sock: Fix misuse of sk_under_memory_pressure() Abel Wu
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Abel Wu @ 2023-05-22  7:01 UTC (permalink / raw)
  To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Glauber Costa, netdev, linux-kernel, Abel Wu

The sk_under_memory_pressure() is called to check whether there is
memory pressure related to this socket. But now it ignores the net-
memcg's pressure if the proto of the socket doesn't care about the
global pressure, which may put burden on its memcg compaction or
reclaim path (also remember that socket memory is un-reclaimable).

So always check the memcg's vm status to alleviate memstalls when
it's in pressure.

Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
---
 include/net/sock.h | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 8b7ed7167243..c73d9bad7ac7 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1411,14 +1411,12 @@ static inline bool sk_has_memory_pressure(const struct sock *sk)
 
 static inline bool sk_under_memory_pressure(const struct sock *sk)
 {
-	if (!sk->sk_prot->memory_pressure)
-		return false;
-
 	if (mem_cgroup_sockets_enabled && sk->sk_memcg &&
 	    mem_cgroup_under_socket_pressure(sk->sk_memcg))
 		return true;
 
-	return !!*sk->sk_prot->memory_pressure;
+	return sk->sk_prot->memory_pressure &&
+		*sk->sk_prot->memory_pressure;
 }
 
 static inline long
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 2/4] sock: Fix misuse of sk_under_memory_pressure()
  2023-05-22  7:01 [PATCH v2 0/4] sock: Improve condition on sockmem pressure Abel Wu
  2023-05-22  7:01 ` [PATCH v2 1/4] sock: Always take memcg pressure into consideration Abel Wu
@ 2023-05-22  7:01 ` Abel Wu
  2023-05-22 12:50   ` Simon Horman
  2023-05-22  7:01 ` [PATCH v2 3/4] sock: Consider memcg pressure when raising sockmem Abel Wu
  2023-05-22  7:01 ` [PATCH v2 4/4] sock: Remove redundant cond of memcg pressure Abel Wu
  3 siblings, 1 reply; 13+ messages in thread
From: Abel Wu @ 2023-05-22  7:01 UTC (permalink / raw)
  To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Glauber Costa, netdev, linux-kernel, Abel Wu

The status of global socket memory pressure is updated when:

  a) __sk_mem_raise_allocated():

	enter: sk_memory_allocated(sk) >  sysctl_mem[1]
	leave: sk_memory_allocated(sk) <= sysctl_mem[0]

  b) __sk_mem_reduce_allocated():

	leave: sk_under_memory_pressure(sk) &&
		sk_memory_allocated(sk) < sysctl_mem[0]

So the conditions of leaving global pressure are inconstant, which
may lead to the situation that one pressured net-memcg prevents the
global pressure from being cleared when there is indeed no global
pressure, thus the global constrains are still in effect unexpectedly
on the other sockets.

This patch fixes this by ignoring the net-memcg's pressure when
deciding whether should leave global memory pressure.

Fixes: e1aab161e013 ("socket: initial cgroup code")
Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
---
 include/net/sock.h | 9 +++++++--
 net/core/sock.c    | 2 +-
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index c73d9bad7ac7..bf930d4db7f0 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1409,14 +1409,19 @@ static inline bool sk_has_memory_pressure(const struct sock *sk)
 	return sk->sk_prot->memory_pressure != NULL;
 }
 
+static inline bool sk_under_global_memory_pressure(const struct sock *sk)
+{
+	return sk->sk_prot->memory_pressure &&
+		*sk->sk_prot->memory_pressure;
+}
+
 static inline bool sk_under_memory_pressure(const struct sock *sk)
 {
 	if (mem_cgroup_sockets_enabled && sk->sk_memcg &&
 	    mem_cgroup_under_socket_pressure(sk->sk_memcg))
 		return true;
 
-	return sk->sk_prot->memory_pressure &&
-		*sk->sk_prot->memory_pressure;
+	return sk_under_global_memory_pressure(sk);
 }
 
 static inline long
diff --git a/net/core/sock.c b/net/core/sock.c
index 5440e67bcfe3..801df091e37a 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3095,7 +3095,7 @@ void __sk_mem_reduce_allocated(struct sock *sk, int amount)
 	if (mem_cgroup_sockets_enabled && sk->sk_memcg)
 		mem_cgroup_uncharge_skmem(sk->sk_memcg, amount);
 
-	if (sk_under_memory_pressure(sk) &&
+	if (sk_under_global_memory_pressure(sk) &&
 	    (sk_memory_allocated(sk) < sk_prot_mem_limits(sk, 0)))
 		sk_leave_memory_pressure(sk);
 }
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 3/4] sock: Consider memcg pressure when raising sockmem
  2023-05-22  7:01 [PATCH v2 0/4] sock: Improve condition on sockmem pressure Abel Wu
  2023-05-22  7:01 ` [PATCH v2 1/4] sock: Always take memcg pressure into consideration Abel Wu
  2023-05-22  7:01 ` [PATCH v2 2/4] sock: Fix misuse of sk_under_memory_pressure() Abel Wu
@ 2023-05-22  7:01 ` Abel Wu
  2023-05-22 12:53   ` Simon Horman
  2023-05-25  1:22   ` Shakeel Butt
  2023-05-22  7:01 ` [PATCH v2 4/4] sock: Remove redundant cond of memcg pressure Abel Wu
  3 siblings, 2 replies; 13+ messages in thread
From: Abel Wu @ 2023-05-22  7:01 UTC (permalink / raw)
  To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Glauber Costa, netdev, linux-kernel, Abel Wu

For now __sk_mem_raise_allocated() mainly considers global socket
memory pressure and allows to raise if no global pressure observed,
including the sockets whose memcgs are in pressure, which might
result in longer memcg memstall.

So take net-memcg's pressure into consideration when allocating
socket memory to alleviate long tail latencies.

Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
---
 net/core/sock.c | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 801df091e37a..7641d64293af 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2977,21 +2977,30 @@ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
 {
 	bool memcg_charge = mem_cgroup_sockets_enabled && sk->sk_memcg;
 	struct proto *prot = sk->sk_prot;
-	bool charged = true;
+	bool charged = true, pressured = false;
 	long allocated;
 
 	sk_memory_allocated_add(sk, amt);
 	allocated = sk_memory_allocated(sk);
-	if (memcg_charge &&
-	    !(charged = mem_cgroup_charge_skmem(sk->sk_memcg, amt,
-						gfp_memcg_charge())))
-		goto suppress_allocation;
+
+	if (memcg_charge) {
+		charged = mem_cgroup_charge_skmem(sk->sk_memcg, amt,
+						  gfp_memcg_charge());
+		if (!charged)
+			goto suppress_allocation;
+		if (mem_cgroup_under_socket_pressure(sk->sk_memcg))
+			pressured = true;
+	}
 
 	/* Under limit. */
-	if (allocated <= sk_prot_mem_limits(sk, 0)) {
+	if (allocated <= sk_prot_mem_limits(sk, 0))
 		sk_leave_memory_pressure(sk);
+	else
+		pressured = true;
+
+	/* No pressure observed in global/memcg. */
+	if (!pressured)
 		return 1;
-	}
 
 	/* Under pressure. */
 	if (allocated > sk_prot_mem_limits(sk, 1))
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 4/4] sock: Remove redundant cond of memcg pressure
  2023-05-22  7:01 [PATCH v2 0/4] sock: Improve condition on sockmem pressure Abel Wu
                   ` (2 preceding siblings ...)
  2023-05-22  7:01 ` [PATCH v2 3/4] sock: Consider memcg pressure when raising sockmem Abel Wu
@ 2023-05-22  7:01 ` Abel Wu
  2023-05-22 12:54   ` Simon Horman
  3 siblings, 1 reply; 13+ messages in thread
From: Abel Wu @ 2023-05-22  7:01 UTC (permalink / raw)
  To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Glauber Costa, netdev, linux-kernel, Abel Wu

Now with the preivous patch, __sk_mem_raise_allocated() considers
the memory pressure of both global and the socket's memcg on a func-
wide level, making the condition of memcg's pressure in question
redundant.

Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
---
 net/core/sock.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 7641d64293af..baccbb58a11a 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3029,9 +3029,14 @@ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
 	if (sk_has_memory_pressure(sk)) {
 		u64 alloc;
 
-		if (!sk_under_memory_pressure(sk))
+		if (!sk_under_global_memory_pressure(sk))
 			return 1;
 		alloc = sk_sockets_allocated_read_positive(sk);
+		/*
+		 * If under global pressure, allow the sockets that are below
+		 * average memory usage to raise, trying to be fair among all
+		 * the sockets under global constrains.
+		 */
 		if (sk_prot_mem_limits(sk, 2) > alloc *
 		    sk_mem_pages(sk->sk_wmem_queued +
 				 atomic_read(&sk->sk_rmem_alloc) +
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/4] sock: Fix misuse of sk_under_memory_pressure()
  2023-05-22  7:01 ` [PATCH v2 2/4] sock: Fix misuse of sk_under_memory_pressure() Abel Wu
@ 2023-05-22 12:50   ` Simon Horman
  0 siblings, 0 replies; 13+ messages in thread
From: Simon Horman @ 2023-05-22 12:50 UTC (permalink / raw)
  To: Abel Wu
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Glauber Costa, netdev, linux-kernel

On Mon, May 22, 2023 at 03:01:20PM +0800, Abel Wu wrote:
> The status of global socket memory pressure is updated when:
> 
>   a) __sk_mem_raise_allocated():
> 
> 	enter: sk_memory_allocated(sk) >  sysctl_mem[1]
> 	leave: sk_memory_allocated(sk) <= sysctl_mem[0]
> 
>   b) __sk_mem_reduce_allocated():
> 
> 	leave: sk_under_memory_pressure(sk) &&
> 		sk_memory_allocated(sk) < sysctl_mem[0]
> 
> So the conditions of leaving global pressure are inconstant, which
> may lead to the situation that one pressured net-memcg prevents the
> global pressure from being cleared when there is indeed no global
> pressure, thus the global constrains are still in effect unexpectedly
> on the other sockets.
> 
> This patch fixes this by ignoring the net-memcg's pressure when
> deciding whether should leave global memory pressure.
> 
> Fixes: e1aab161e013 ("socket: initial cgroup code")

really pedantic nit:

Fixes: e1aab161e013 ("socket: initial cgroup code.")

> Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>

...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/4] sock: Consider memcg pressure when raising sockmem
  2023-05-22  7:01 ` [PATCH v2 3/4] sock: Consider memcg pressure when raising sockmem Abel Wu
@ 2023-05-22 12:53   ` Simon Horman
  2023-05-25  1:22   ` Shakeel Butt
  1 sibling, 0 replies; 13+ messages in thread
From: Simon Horman @ 2023-05-22 12:53 UTC (permalink / raw)
  To: Abel Wu
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Glauber Costa, netdev, linux-kernel

On Mon, May 22, 2023 at 03:01:21PM +0800, Abel Wu wrote:
> For now __sk_mem_raise_allocated() mainly considers global socket
> memory pressure and allows to raise if no global pressure observed,
> including the sockets whose memcgs are in pressure, which might
> result in longer memcg memstall.
> 
> So take net-memcg's pressure into consideration when allocating
> socket memory to alleviate long tail latencies.
> 
> Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
> ---
>  net/core/sock.c | 23 ++++++++++++++++-------
>  1 file changed, 16 insertions(+), 7 deletions(-)
> 
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 801df091e37a..7641d64293af 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -2977,21 +2977,30 @@ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
>  {
>  	bool memcg_charge = mem_cgroup_sockets_enabled && sk->sk_memcg;
>  	struct proto *prot = sk->sk_prot;
> -	bool charged = true;
> +	bool charged = true, pressured = false;
>  	long allocated;

Please preserve reverse xmas tree - longest line to shortest -
for local variables in neworking code.

...

-- 
pw-bot: cr


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 4/4] sock: Remove redundant cond of memcg pressure
  2023-05-22  7:01 ` [PATCH v2 4/4] sock: Remove redundant cond of memcg pressure Abel Wu
@ 2023-05-22 12:54   ` Simon Horman
  2023-05-23  3:04     ` Abel Wu
  0 siblings, 1 reply; 13+ messages in thread
From: Simon Horman @ 2023-05-22 12:54 UTC (permalink / raw)
  To: Abel Wu
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Glauber Costa, netdev, linux-kernel

On Mon, May 22, 2023 at 03:01:22PM +0800, Abel Wu wrote:
> Now with the preivous patch, __sk_mem_raise_allocated() considers

nit: s/preivous/previous/

> the memory pressure of both global and the socket's memcg on a func-
> wide level, making the condition of memcg's pressure in question
> redundant.
> 
> Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
> ---
>  net/core/sock.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 7641d64293af..baccbb58a11a 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -3029,9 +3029,14 @@ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
>  	if (sk_has_memory_pressure(sk)) {
>  		u64 alloc;
>  
> -		if (!sk_under_memory_pressure(sk))
> +		if (!sk_under_global_memory_pressure(sk))
>  			return 1;
>  		alloc = sk_sockets_allocated_read_positive(sk);
> +		/*
> +		 * If under global pressure, allow the sockets that are below
> +		 * average memory usage to raise, trying to be fair among all
> +		 * the sockets under global constrains.
> +		 */

nit:
		/* Multi-line comments in networking code
		 * look like this.
		 */

>  		if (sk_prot_mem_limits(sk, 2) > alloc *
>  		    sk_mem_pages(sk->sk_wmem_queued +
>  				 atomic_read(&sk->sk_rmem_alloc) +
> -- 
> 2.37.3
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Re: [PATCH v2 4/4] sock: Remove redundant cond of memcg pressure
  2023-05-22 12:54   ` Simon Horman
@ 2023-05-23  3:04     ` Abel Wu
  0 siblings, 0 replies; 13+ messages in thread
From: Abel Wu @ 2023-05-23  3:04 UTC (permalink / raw)
  To: Simon Horman
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Glauber Costa, netdev, linux-kernel

Hi Simon, thanks for reviewing! I will fix the coding style issues
next version!

Thanks,
	Abel

On 5/22/23 8:54 PM, Simon Horman wrote:
> On Mon, May 22, 2023 at 03:01:22PM +0800, Abel Wu wrote:
>> Now with the preivous patch, __sk_mem_raise_allocated() considers
> 
> nit: s/preivous/previous/
> 
>> the memory pressure of both global and the socket's memcg on a func-
>> wide level, making the condition of memcg's pressure in question
>> redundant.
>>
>> Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
>> ---
>>   net/core/sock.c | 7 ++++++-
>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/core/sock.c b/net/core/sock.c
>> index 7641d64293af..baccbb58a11a 100644
>> --- a/net/core/sock.c
>> +++ b/net/core/sock.c
>> @@ -3029,9 +3029,14 @@ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
>>   	if (sk_has_memory_pressure(sk)) {
>>   		u64 alloc;
>>   
>> -		if (!sk_under_memory_pressure(sk))
>> +		if (!sk_under_global_memory_pressure(sk))
>>   			return 1;
>>   		alloc = sk_sockets_allocated_read_positive(sk);
>> +		/*
>> +		 * If under global pressure, allow the sockets that are below
>> +		 * average memory usage to raise, trying to be fair among all
>> +		 * the sockets under global constrains.
>> +		 */
> 
> nit:
> 		/* Multi-line comments in networking code
> 		 * look like this.
> 		 */
> 
>>   		if (sk_prot_mem_limits(sk, 2) > alloc *
>>   		    sk_mem_pages(sk->sk_wmem_queued +
>>   				 atomic_read(&sk->sk_rmem_alloc) +
>> -- 
>> 2.37.3
>>
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/4] sock: Consider memcg pressure when raising sockmem
  2023-05-22  7:01 ` [PATCH v2 3/4] sock: Consider memcg pressure when raising sockmem Abel Wu
  2023-05-22 12:53   ` Simon Horman
@ 2023-05-25  1:22   ` Shakeel Butt
  2023-05-29 11:58     ` Abel Wu
  1 sibling, 1 reply; 13+ messages in thread
From: Shakeel Butt @ 2023-05-25  1:22 UTC (permalink / raw)
  To: Abel Wu
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Glauber Costa, netdev, linux-kernel

On Mon, May 22, 2023 at 03:01:21PM +0800, Abel Wu wrote:
> For now __sk_mem_raise_allocated() mainly considers global socket
> memory pressure and allows to raise if no global pressure observed,
> including the sockets whose memcgs are in pressure, which might
> result in longer memcg memstall.
> 
> So take net-memcg's pressure into consideration when allocating
> socket memory to alleviate long tail latencies.
> 
> Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>

Hi Abel,

Have you seen any real world production issue which is fixed by this
patch or is it more of a fix after reading code?

This code is quite subtle and small changes can cause unintended
behavior changes. At the moment the tcp memory accounting and memcg
accounting is intermingled and I think we should decouple them.

> ---
>  net/core/sock.c | 23 ++++++++++++++++-------
>  1 file changed, 16 insertions(+), 7 deletions(-)
> 
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 801df091e37a..7641d64293af 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -2977,21 +2977,30 @@ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
>  {
>  	bool memcg_charge = mem_cgroup_sockets_enabled && sk->sk_memcg;
>  	struct proto *prot = sk->sk_prot;
> -	bool charged = true;
> +	bool charged = true, pressured = false;
>  	long allocated;
>  
>  	sk_memory_allocated_add(sk, amt);
>  	allocated = sk_memory_allocated(sk);
> -	if (memcg_charge &&
> -	    !(charged = mem_cgroup_charge_skmem(sk->sk_memcg, amt,
> -						gfp_memcg_charge())))
> -		goto suppress_allocation;
> +
> +	if (memcg_charge) {
> +		charged = mem_cgroup_charge_skmem(sk->sk_memcg, amt,
> +						  gfp_memcg_charge());
> +		if (!charged)
> +			goto suppress_allocation;
> +		if (mem_cgroup_under_socket_pressure(sk->sk_memcg))

The memcg under pressure callback does a upward memcg tree walk, do
please make sure you have tested the performance impact of this.

> +			pressured = true;
> +	}
>  
>  	/* Under limit. */
> -	if (allocated <= sk_prot_mem_limits(sk, 0)) {
> +	if (allocated <= sk_prot_mem_limits(sk, 0))
>  		sk_leave_memory_pressure(sk);
> +	else
> +		pressured = true;
> +
> +	/* No pressure observed in global/memcg. */
> +	if (!pressured)
>  		return 1;
> -	}
>  
>  	/* Under pressure. */
>  	if (allocated > sk_prot_mem_limits(sk, 1))
> -- 
> 2.37.3
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/4] sock: Consider memcg pressure when raising sockmem
  2023-05-25  1:22   ` Shakeel Butt
@ 2023-05-29 11:58     ` Abel Wu
  2023-05-29 21:12       ` Shakeel Butt
  0 siblings, 1 reply; 13+ messages in thread
From: Abel Wu @ 2023-05-29 11:58 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Glauber Costa, netdev, linux-kernel

Hi Shakeel, thanks for reviewing! And sorry for replying so late,
I was on a vocation :)

On 5/25/23 9:22 AM, Shakeel Butt wrote:
> On Mon, May 22, 2023 at 03:01:21PM +0800, Abel Wu wrote:
>> For now __sk_mem_raise_allocated() mainly considers global socket
>> memory pressure and allows to raise if no global pressure observed,
>> including the sockets whose memcgs are in pressure, which might
>> result in longer memcg memstall.
>>
>> So take net-memcg's pressure into consideration when allocating
>> socket memory to alleviate long tail latencies.
>>
>> Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
> 
> Hi Abel,
> 
> Have you seen any real world production issue which is fixed by this
> patch or is it more of a fix after reading code?

The latter. But we do observe one common case in the production env
that p2p service, which mainly downloads container images, running
inside a container with tight memory limit can easily be throttled and
keep memstalled for a long period of time and sometimes even be OOM-
killed. This service shows burst usage of TCP memory and I think it
indeed needs suppressing sockmem allocation if memcg is already under
pressure. The memcg pressure is usually caused by too many page caches
and the dirty ones starting to be wrote back to slow backends. So it
is insane to continuously receive net data to consume more memory.

> 
> This code is quite subtle and small changes can cause unintended
> behavior changes. At the moment the tcp memory accounting and memcg
> accounting is intermingled and I think we should decouple them.

My original intention to post this patchset is to clarify that:

   - proto pressure only considers sysctl_mem[] (patch 2)
   - memcg pressure only indicates the pressure inside itself
   - consider both whenever needs allocation or reclaim (patch 1,3)

In this way, the two kinds of pressure maintain purer semantics, and
socket core can react on both of them properly and consistently.

> 
>> ---
>>   net/core/sock.c | 23 ++++++++++++++++-------
>>   1 file changed, 16 insertions(+), 7 deletions(-)
>>
>> diff --git a/net/core/sock.c b/net/core/sock.c
>> index 801df091e37a..7641d64293af 100644
>> --- a/net/core/sock.c
>> +++ b/net/core/sock.c
>> @@ -2977,21 +2977,30 @@ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
>>   {
>>   	bool memcg_charge = mem_cgroup_sockets_enabled && sk->sk_memcg;
>>   	struct proto *prot = sk->sk_prot;
>> -	bool charged = true;
>> +	bool charged = true, pressured = false;
>>   	long allocated;
>>   
>>   	sk_memory_allocated_add(sk, amt);
>>   	allocated = sk_memory_allocated(sk);
>> -	if (memcg_charge &&
>> -	    !(charged = mem_cgroup_charge_skmem(sk->sk_memcg, amt,
>> -						gfp_memcg_charge())))
>> -		goto suppress_allocation;
>> +
>> +	if (memcg_charge) {
>> +		charged = mem_cgroup_charge_skmem(sk->sk_memcg, amt,
>> +						  gfp_memcg_charge());
>> +		if (!charged)
>> +			goto suppress_allocation;
>> +		if (mem_cgroup_under_socket_pressure(sk->sk_memcg))
> 
> The memcg under pressure callback does a upward memcg tree walk, do
> please make sure you have tested the performance impact of this.

Yes, I have tested several benchmarks on a dual socket machine modeled
Intel Xeon(R) Platinum 8260 with SNC disabled, that is 2 NUMA nodes each
of which has 24C/48T. All the benchmarks are done inside a separate
cgroup in a clean host. Below shows the result of tbench4 and netperf:

tbench4 Throughput (misleading but traditional)
                             baseline               patchset
Hmean     1        377.62 (   0.00%)      375.06 *  -0.68%*
Hmean     2        753.99 (   0.00%)      753.21 *  -0.10%*
Hmean     4       1503.50 (   0.00%)     1493.07 *  -0.69%*
Hmean     8       2941.43 (   0.00%)     2925.18 *  -0.55%*
Hmean     16      5637.59 (   0.00%)     5603.64 *  -0.60%*
Hmean     32      9042.90 (   0.00%)     9022.53 *  -0.23%*
Hmean     64     10530.55 (   0.00%)    10554.89 *   0.23%*
Hmean     128    24230.20 (   0.00%)    24424.74 *   0.80%*
Hmean     256    23798.21 (   0.00%)    23941.24 *   0.60%*
Hmean     384    23620.63 (   0.00%)    23569.54 *  -0.22%*

netperf-udp
                                    baseline               patchset
Hmean     send-64         281.99 (   0.00%)      274.50 *  -2.65%*
Hmean     send-128        556.70 (   0.00%)      545.82 *  -1.96%*
Hmean     send-256       1102.60 (   0.00%)     1091.21 *  -1.03%*
Hmean     send-1024      4180.48 (   0.00%)     4073.87 *  -2.55%*
Hmean     send-2048      7837.61 (   0.00%)     7707.12 *  -1.66%*
Hmean     send-3312     12157.49 (   0.00%)    11845.03 *  -2.57%*
Hmean     send-4096     14512.64 (   0.00%)    14156.45 *  -2.45%*
Hmean     send-8192     24015.40 (   0.00%)    23920.94 (  -0.39%)
Hmean     send-16384    39875.21 (   0.00%)    39696.67 (  -0.45%)
Hmean     recv-64         281.99 (   0.00%)      274.50 *  -2.65%*
Hmean     recv-128        556.70 (   0.00%)      545.82 *  -1.96%*
Hmean     recv-256       1102.60 (   0.00%)     1091.21 *  -1.03%*
Hmean     recv-1024      4180.48 (   0.00%)     4073.76 *  -2.55%*
Hmean     recv-2048      7837.61 (   0.00%)     7707.11 *  -1.67%*
Hmean     recv-3312     12157.49 (   0.00%)    11845.03 *  -2.57%*
Hmean     recv-4096     14512.62 (   0.00%)    14156.45 *  -2.45%*
Hmean     recv-8192     24015.29 (   0.00%)    23920.88 (  -0.39%)
Hmean     recv-16384    39873.93 (   0.00%)    39696.02 (  -0.45%)

netperf-tcp
                               baseline               patchset
Hmean     64        1777.05 (   0.00%)     1793.04 (   0.90%)
Hmean     128       3364.25 (   0.00%)     3451.05 *   2.58%*
Hmean     256       6309.21 (   0.00%)     6506.84 *   3.13%*
Hmean     1024     19571.52 (   0.00%)    19606.65 (   0.18%)
Hmean     2048     26467.00 (   0.00%)    26658.12 (   0.72%)
Hmean     3312     31312.36 (   0.00%)    31403.54 (   0.29%)
Hmean     4096     33263.37 (   0.00%)    33278.77 (   0.05%)
Hmean     8192     39961.82 (   0.00%)    40149.77 (   0.47%)
Hmean     16384    46065.33 (   0.00%)    46683.67 (   1.34%)

Except slight regression in netperf-udp, no obvious performance win
or loss. But as you reminded me of the cost of hierarchical behavior,
I re-tested the cases in a 5-level depth cgroup (originally 2-level),
and the results are:

tbench4 Throughput (misleading but traditional)
                             baseline               patchset
Hmean     1        361.93 (   0.00%)      367.58 *   1.56%*
Hmean     2        734.39 (   0.00%)      730.33 *  -0.55%*
Hmean     4       1426.82 (   0.00%)     1440.81 *   0.98%*
Hmean     8       2848.86 (   0.00%)     2860.87 *   0.42%*
Hmean     16      5436.72 (   0.00%)     5491.72 *   1.01%*
Hmean     32      8743.34 (   0.00%)     8913.27 *   1.94%*
Hmean     64     10345.41 (   0.00%)    10436.92 *   0.88%*
Hmean     128    23390.36 (   0.00%)    23353.09 *  -0.16%*
Hmean     256    23823.20 (   0.00%)    23509.79 *  -1.32%*
Hmean     384    23268.09 (   0.00%)    23178.10 *  -0.39%*

netperf-udp
                                    baseline               patchset
Hmean     send-64         278.31 (   0.00%)      275.68 *  -0.94%*
Hmean     send-128        554.52 (   0.00%)      547.46 (  -1.27%)
Hmean     send-256       1106.64 (   0.00%)     1103.01 (  -0.33%)
Hmean     send-1024      4135.84 (   0.00%)     4057.47 *  -1.89%*
Hmean     send-2048      7816.13 (   0.00%)     7732.71 *  -1.07%*
Hmean     send-3312     12068.32 (   0.00%)    11895.94 *  -1.43%*
Hmean     send-4096     14358.02 (   0.00%)    14304.06 (  -0.38%)
Hmean     send-8192     24041.57 (   0.00%)    24061.70 (   0.08%)
Hmean     send-16384    39996.09 (   0.00%)    39936.08 (  -0.15%)
Hmean     recv-64         278.31 (   0.00%)      275.68 *  -0.94%*
Hmean     recv-128        554.52 (   0.00%)      547.46 (  -1.27%)
Hmean     recv-256       1106.64 (   0.00%)     1103.01 (  -0.33%)
Hmean     recv-1024      4135.84 (   0.00%)     4057.47 *  -1.89%*
Hmean     recv-2048      7816.13 (   0.00%)     7732.71 *  -1.07%*
Hmean     recv-3312     12068.32 (   0.00%)    11895.94 *  -1.43%*
Hmean     recv-4096     14357.99 (   0.00%)    14304.04 (  -0.38%)
Hmean     recv-8192     24041.43 (   0.00%)    24061.58 (   0.08%)
Hmean     recv-16384    39995.72 (   0.00%)    39935.68 (  -0.15%)

netperf-tcp
                               baseline               patchset
Hmean     64        1779.93 (   0.00%)     1784.75 (   0.27%)
Hmean     128       3380.32 (   0.00%)     3424.14 (   1.30%)
Hmean     256       6383.37 (   0.00%)     6504.97 *   1.90%*
Hmean     1024     19345.07 (   0.00%)    19604.06 *   1.34%*
Hmean     2048     26547.60 (   0.00%)    26743.94 *   0.74%*
Hmean     3312     30948.40 (   0.00%)    31419.11 *   1.52%*
Hmean     4096     32888.83 (   0.00%)    33125.01 *   0.72%*
Hmean     8192     40020.38 (   0.00%)    39949.53 (  -0.18%)
Hmean     16384    46084.48 (   0.00%)    46300.43 (   0.47%)

Still no obvious difference, and even the udp regression is reduced.

Thanks & Best,
	Abel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/4] sock: Consider memcg pressure when raising sockmem
  2023-05-29 11:58     ` Abel Wu
@ 2023-05-29 21:12       ` Shakeel Butt
  2023-05-30  9:58         ` Abel Wu
  0 siblings, 1 reply; 13+ messages in thread
From: Shakeel Butt @ 2023-05-29 21:12 UTC (permalink / raw)
  To: Abel Wu
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev, linux-kernel, linux-mm, cgroups


+linux-mm and cgroups

On Mon, May 29, 2023 at 07:58:45PM +0800, Abel Wu wrote:
> Hi Shakeel, thanks for reviewing! And sorry for replying so late,
> I was on a vocation :)
> 
> On 5/25/23 9:22 AM, Shakeel Butt wrote:
> > On Mon, May 22, 2023 at 03:01:21PM +0800, Abel Wu wrote:
> > > For now __sk_mem_raise_allocated() mainly considers global socket
> > > memory pressure and allows to raise if no global pressure observed,
> > > including the sockets whose memcgs are in pressure, which might
> > > result in longer memcg memstall.
> > > 
> > > So take net-memcg's pressure into consideration when allocating
> > > socket memory to alleviate long tail latencies.
> > > 
> > > Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
> > 
> > Hi Abel,
> > 
> > Have you seen any real world production issue which is fixed by this
> > patch or is it more of a fix after reading code?
> 
> The latter. But we do observe one common case in the production env
> that p2p service, which mainly downloads container images, running
> inside a container with tight memory limit can easily be throttled and
> keep memstalled for a long period of time and sometimes even be OOM-
> killed. This service shows burst usage of TCP memory and I think it
> indeed needs suppressing sockmem allocation if memcg is already under
> pressure. The memcg pressure is usually caused by too many page caches
> and the dirty ones starting to be wrote back to slow backends. So it
> is insane to continuously receive net data to consume more memory.
> 

We actually made an intentional decision to not throttle the incoming
traffic under memory pressure. See 720ca52bcef22 ("net-memcg: avoid
stalls when under memory pressure"). If you think the throttling
behavior is preferred for your application, please propose the patch
separately and we can work on how to enable flexible policy here.

> > 
> > This code is quite subtle and small changes can cause unintended
> > behavior changes. At the moment the tcp memory accounting and memcg
> > accounting is intermingled and I think we should decouple them.
> 
> My original intention to post this patchset is to clarify that:
> 
>   - proto pressure only considers sysctl_mem[] (patch 2)
>   - memcg pressure only indicates the pressure inside itself
>   - consider both whenever needs allocation or reclaim (patch 1,3)
> 
> In this way, the two kinds of pressure maintain purer semantics, and
> socket core can react on both of them properly and consistently.

Can you please resend you patch series (without patch 3) and Cc to
linux-mm, cgroups list and memcg maintainers as well?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Re: [PATCH v2 3/4] sock: Consider memcg pressure when raising sockmem
  2023-05-29 21:12       ` Shakeel Butt
@ 2023-05-30  9:58         ` Abel Wu
  0 siblings, 0 replies; 13+ messages in thread
From: Abel Wu @ 2023-05-30  9:58 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev, linux-kernel, linux-mm, cgroups

On 5/30/23 5:12 AM, Shakeel Butt wrote:
> 
> +linux-mm and cgroups
> 
> On Mon, May 29, 2023 at 07:58:45PM +0800, Abel Wu wrote:
>> Hi Shakeel, thanks for reviewing! And sorry for replying so late,
>> I was on a vocation :)
>>
>> On 5/25/23 9:22 AM, Shakeel Butt wrote:
>>> On Mon, May 22, 2023 at 03:01:21PM +0800, Abel Wu wrote:
>>>> For now __sk_mem_raise_allocated() mainly considers global socket
>>>> memory pressure and allows to raise if no global pressure observed,
>>>> including the sockets whose memcgs are in pressure, which might
>>>> result in longer memcg memstall.
>>>>
>>>> So take net-memcg's pressure into consideration when allocating
>>>> socket memory to alleviate long tail latencies.
>>>>
>>>> Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
>>>
>>> Hi Abel,
>>>
>>> Have you seen any real world production issue which is fixed by this
>>> patch or is it more of a fix after reading code?
>>
>> The latter. But we do observe one common case in the production env
>> that p2p service, which mainly downloads container images, running
>> inside a container with tight memory limit can easily be throttled and
>> keep memstalled for a long period of time and sometimes even be OOM-
>> killed. This service shows burst usage of TCP memory and I think it
>> indeed needs suppressing sockmem allocation if memcg is already under
>> pressure. The memcg pressure is usually caused by too many page caches
>> and the dirty ones starting to be wrote back to slow backends. So it
>> is insane to continuously receive net data to consume more memory.
>>
> 
> We actually made an intentional decision to not throttle the incoming
> traffic under memory pressure. See 720ca52bcef22 ("net-memcg: avoid
> stalls when under memory pressure"). If you think the throttling
> behavior is preferred for your application, please propose the patch
> separately and we can work on how to enable flexible policy here.

Ah I see. Thanks for providing the context. So suppressing the alloc
under memcg pressure could further keep senders waiting if SACKed segs
get dropped from the OFO queue.

> 
>>>
>>> This code is quite subtle and small changes can cause unintended
>>> behavior changes. At the moment the tcp memory accounting and memcg
>>> accounting is intermingled and I think we should decouple them.
>>
>> My original intention to post this patchset is to clarify that:
>>
>>    - proto pressure only considers sysctl_mem[] (patch 2)
>>    - memcg pressure only indicates the pressure inside itself
>>    - consider both whenever needs allocation or reclaim (patch 1,3)
>>
>> In this way, the two kinds of pressure maintain purer semantics, and
>> socket core can react on both of them properly and consistently.
> 
> Can you please resend you patch series (without patch 3) and Cc to
> linux-mm, cgroups list and memcg maintainers as well?

Yeah, absolutely.

Thanks,
	Abel

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-05-30  9:58 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-22  7:01 [PATCH v2 0/4] sock: Improve condition on sockmem pressure Abel Wu
2023-05-22  7:01 ` [PATCH v2 1/4] sock: Always take memcg pressure into consideration Abel Wu
2023-05-22  7:01 ` [PATCH v2 2/4] sock: Fix misuse of sk_under_memory_pressure() Abel Wu
2023-05-22 12:50   ` Simon Horman
2023-05-22  7:01 ` [PATCH v2 3/4] sock: Consider memcg pressure when raising sockmem Abel Wu
2023-05-22 12:53   ` Simon Horman
2023-05-25  1:22   ` Shakeel Butt
2023-05-29 11:58     ` Abel Wu
2023-05-29 21:12       ` Shakeel Butt
2023-05-30  9:58         ` Abel Wu
2023-05-22  7:01 ` [PATCH v2 4/4] sock: Remove redundant cond of memcg pressure Abel Wu
2023-05-22 12:54   ` Simon Horman
2023-05-23  3:04     ` Abel Wu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).