All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tony Lu <tonylu@linux.alibaba.com>
To: Jan Karcher <jaka@linux.ibm.com>
Cc: David Miller <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	netdev@vger.kernel.org, linux-s390@vger.kernel.org,
	Heiko Carstens <hca@linux.ibm.com>,
	Alexandra Winter <wintera@linux.ibm.com>,
	Wenjia Zhang <wenjia@linux.ibm.com>,
	Thorsten Winkler <twinkler@linux.ibm.com>,
	Stefan Raspl <raspl@linux.ibm.com>,
	Karsten Graul <kgraul@linux.ibm.com>
Subject: Re: [PATCH net] net/smc: Fix expected buffersizes and sync logic
Date: Wed, 23 Nov 2022 19:53:40 +0800	[thread overview]
Message-ID: <Y34JxFWBdUxvLQb4@TonyMac-Alibaba> (raw)
In-Reply-To: <20221123104907.14624-1-jaka@linux.ibm.com>

On Wed, Nov 23, 2022 at 11:49:07AM +0100, Jan Karcher wrote:
> The fixed commit changed the expected behavior of buffersizes
> set by the user using the setsockopt mechanism.
> Before the fixed patch the logic for determining the buffersizes used
> was the following:
> 
> default  = net.ipv4.tcp_{w|r}mem[1]
> sockopt  = the setsockopt mechanism
> val      = the value assigned in default or via setsockopt
> sk_buf   = short for sk_{snd|rcv}buf
> real_buf = the real size of the buffer (sk_buf_size in __smc_buf_create)
> 
>   exposed   | net/core/sock.c  |    af_smc.c    |  smc_core.c
>             |                  |                |
> +---------+ |                  | +------------+ | +-------------------+
> | default |----------------------| sk_buf=val |---| real_buf=sk_buf/2 |
> +---------+ |                  | +------------+ | +-------------------+
>             |                  |                |    ^
>             |                  |                |    |
> +---------+ | +--------------+ |                |    |
> | sockopt |---| sk_buf=val*2 |-----------------------|
> +---------+ | +--------------+ |                |
>             |                  |                |
> 
> The fixed patch introduced a dedicated sysctl for smc
> and removed the /2 in smc_core.c resulting in the following flow:
> 
> default  = net.smc.{w|r}mem (which defaults to net.ipv4.tcp_{w|r}mem[1])
> sockopt  = the setsockopt mechanism
> val      = the value assigned in default or via setsockopt
> sk_buf   = short for sk_{snd|rcv}buf
> real_buf = the real size of the buffer (sk_buf_size in __smc_buf_create)
> 
>   exposed   | net/core/sock.c  |    af_smc.c    |  smc_core.c
>             |                  |                |
> +---------+ |                  | +------------+ | +-----------------+
> | default |----------------------| sk_buf=val |---| real_buf=sk_buf |
> +---------+ |                  | +------------+ | +-----------------+
>             |                  |                |    ^
>             |                  |                |    |
> +---------+ | +--------------+ |                |    |
> | sockopt |---| sk_buf=val*2 |-----------------------|
> +---------+ | +--------------+ |                |
>             |                  |                |
> 
> This would result in double of memory used for existing configurations
> that are using setsockopt.

Firstly, thanks for your detailed diagrams :-)

And the original decision to use user-provided values rather than
value/2 to follow the instructions of the socket manual [1].

  SO_RCVBUF
         Sets or gets the maximum socket receive buffer in bytes.
         The kernel doubles this value (to allow space for
         bookkeeping overhead) when it is set using setsockopt(2),
         and this doubled value is returned by getsockopt(2).  The
         default value is set by the
         /proc/sys/net/core/rmem_default file, and the maximum
         allowed value is set by the /proc/sys/net/core/rmem_max
         file.  The minimum (doubled) value for this option is 256.

[1] https://man7.org/linux/man-pages/man7/socket.7.html

The user of SMC should know that setsockopt() with SO_{RCV|SND}BUF will
double the values in kernel, and getsockopt() will return the doubled
values. So that they should use half of the values which are passed to
setsockopt(). The original patch tries to make things easier in SMC and
let user-space to handle them following the socket manual.

> SMC historically decided to use the explicit value given by the user
> to allocate the memory. This is why we used the /2 in smc_core.c.
> That logic was not applied to the default value.

Yep, let back to the patch which introduced smc_{w|r}mem knobs, it's a
trade-off to follow original logic of SMC, or follow the socket manual.
We decides to follow the instruction of manuals in the end.

Cheers,
Tony Lu

> Since we now have our own sysctl, which is also exposed to the user,
> we should sync the logic in a way that both values are the real value
> used by our code and shown by smc_stats. To achieve this this patch
> changes the behavior to:
> 
> default  = net.smc.{w|r}mem (which defaults to net.ipv4.tcp_{w|r}mem[1])
> sockopt  = the setsockopt mechanism
> val      = the value assigned in default or via setsockopt
> sk_buf   = short for sk_{snd|rcv}buf
> real_buf = the real size of the buffer (sk_buf_size in __smc_buf_create)
> 
>   exposed   | net/core/sock.c  |    af_smc.c     |  smc_core.c
>             |                  |                 |
> +---------+ |                  | +-------------+ | +-----------------+
> | default |----------------------| sk_buf=val*2|---|real_buf=sk_buf/2|
> +---------+ |                  | +-------------+ | +-----------------+
>             |                  |                 |    ^
>             |                  |                 |    |
> +---------+ | +--------------+ |                 |    |
> | sockopt |---| sk_buf=val*2 |------------------------|
> +---------+ | +--------------+ |                 |
>             |                  |                 |
> 
> This way both paths follow the same pattern and the expected behavior
> is re-established.
> 
> Fixes: 0227f058aa29 ("net/smc: Unbind r/w buffer size from clcsock and make them tunable")
> Signed-off-by: Jan Karcher <jaka@linux.ibm.com>
> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
> ---
>  net/smc/af_smc.c   | 9 +++++++--
>  net/smc/smc_core.c | 8 ++++----
>  2 files changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> index 036532cf39aa..a8c84e7bac99 100644
> --- a/net/smc/af_smc.c
> +++ b/net/smc/af_smc.c
> @@ -366,6 +366,7 @@ static void smc_destruct(struct sock *sk)
>  static struct sock *smc_sock_alloc(struct net *net, struct socket *sock,
>  				   int protocol)
>  {
> +	int buffersize_without_overhead;
>  	struct smc_sock *smc;
>  	struct proto *prot;
>  	struct sock *sk;
> @@ -379,8 +380,12 @@ static struct sock *smc_sock_alloc(struct net *net, struct socket *sock,
>  	sk->sk_state = SMC_INIT;
>  	sk->sk_destruct = smc_destruct;
>  	sk->sk_protocol = protocol;
> -	WRITE_ONCE(sk->sk_sndbuf, READ_ONCE(net->smc.sysctl_wmem));
> -	WRITE_ONCE(sk->sk_rcvbuf, READ_ONCE(net->smc.sysctl_rmem));
> +	buffersize_without_overhead =
> +		min_t(int, READ_ONCE(net->smc.sysctl_wmem), INT_MAX / 2);
> +	WRITE_ONCE(sk->sk_sndbuf, buffersize_without_overhead * 2);
> +	buffersize_without_overhead =
> +		min_t(int, READ_ONCE(net->smc.sysctl_rmem), INT_MAX / 2);
> +	WRITE_ONCE(sk->sk_rcvbuf, buffersize_without_overhead * 2);
>  	smc = smc_sk(sk);
>  	INIT_WORK(&smc->tcp_listen_work, smc_tcp_listen_work);
>  	INIT_WORK(&smc->connect_work, smc_connect_work);
> diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
> index 00fb352c2765..36850a2ae167 100644
> --- a/net/smc/smc_core.c
> +++ b/net/smc/smc_core.c
> @@ -2314,10 +2314,10 @@ static int __smc_buf_create(struct smc_sock *smc, bool is_smcd, bool is_rmb)
>  
>  	if (is_rmb)
>  		/* use socket recv buffer size (w/o overhead) as start value */
> -		sk_buf_size = smc->sk.sk_rcvbuf;
> +		sk_buf_size = smc->sk.sk_rcvbuf / 2;
>  	else
>  		/* use socket send buffer size (w/o overhead) as start value */
> -		sk_buf_size = smc->sk.sk_sndbuf;
> +		sk_buf_size = smc->sk.sk_sndbuf / 2;
>  
>  	for (bufsize_short = smc_compress_bufsize(sk_buf_size, is_smcd, is_rmb);
>  	     bufsize_short >= 0; bufsize_short--) {
> @@ -2376,7 +2376,7 @@ static int __smc_buf_create(struct smc_sock *smc, bool is_smcd, bool is_rmb)
>  	if (is_rmb) {
>  		conn->rmb_desc = buf_desc;
>  		conn->rmbe_size_short = bufsize_short;
> -		smc->sk.sk_rcvbuf = bufsize;
> +		smc->sk.sk_rcvbuf = bufsize * 2;
>  		atomic_set(&conn->bytes_to_rcv, 0);
>  		conn->rmbe_update_limit =
>  			smc_rmb_wnd_update_limit(buf_desc->len);
> @@ -2384,7 +2384,7 @@ static int __smc_buf_create(struct smc_sock *smc, bool is_smcd, bool is_rmb)
>  			smc_ism_set_conn(conn); /* map RMB/smcd_dev to conn */
>  	} else {
>  		conn->sndbuf_desc = buf_desc;
> -		smc->sk.sk_sndbuf = bufsize;
> +		smc->sk.sk_sndbuf = bufsize * 2;
>  		atomic_set(&conn->sndbuf_space, bufsize);
>  	}
>  	return 0;
> -- 
> 2.34.1

  reply	other threads:[~2022-11-23 11:53 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-23 10:49 [PATCH net] net/smc: Fix expected buffersizes and sync logic Jan Karcher
2022-11-23 11:53 ` Tony Lu [this message]
2022-11-23 13:13   ` Jan Karcher
2022-11-23 13:41     ` Tony Lu
2022-11-24 13:00       ` Alexandra Winter
2022-11-24 14:07         ` Alexandra Winter
2022-11-25  6:15           ` Jan Karcher
2022-11-25  7:05             ` Tony Lu
2022-11-25 10:59               ` Alexandra Winter
2022-11-28  4:33                 ` Tony Lu
2022-11-28  8:32                   ` Wenjia Zhang
2022-11-28 12:24                     ` Tony Lu
2022-12-02 13:10                 ` Jan Karcher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y34JxFWBdUxvLQb4@TonyMac-Alibaba \
    --to=tonylu@linux.alibaba.com \
    --cc=davem@davemloft.net \
    --cc=hca@linux.ibm.com \
    --cc=jaka@linux.ibm.com \
    --cc=kgraul@linux.ibm.com \
    --cc=kuba@kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=raspl@linux.ibm.com \
    --cc=twinkler@linux.ibm.com \
    --cc=wenjia@linux.ibm.com \
    --cc=wintera@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.