Re: [PATCH net-next v2] net/smc: Reduce overflow of smc clcsock listen queue

From: Karsten Graul <kgraul@linux.ibm.com>
To: Tony Lu <tonylu@linux.alibaba.com>
Cc: "D. Wythe" <alibuda@linux.alibaba.com>,
	dust.li@linux.alibaba.com, kuba@kernel.org, davem@davemloft.net,
	netdev@vger.kernel.org, linux-s390@vger.kernel.org,
	linux-rdma@vger.kernel.org
Subject: Re: [PATCH net-next v2] net/smc: Reduce overflow of smc clcsock listen queue
Date: Thu, 13 Jan 2022 09:07:51 +0100	[thread overview]
Message-ID: <5a5ba1b6-93d7-5c1e-aab2-23a52727fbd1@linux.ibm.com> (raw)
In-Reply-To: <YdaUuOq+SkhYTWU8@TonyMac-Alibaba>

On 06/01/2022 08:05, Tony Lu wrote:
> On Wed, Jan 05, 2022 at 08:13:23PM +0100, Karsten Graul wrote:
>> On 05/01/2022 16:06, D. Wythe wrote:
>>> LGTM. Fallback makes the restrictions on SMC dangling
>>> connections more meaningful to me, compared to dropping them.
>>>
>>> Overall, i see there are two scenario.
>>>
>>> 1. Drop the overflow connections limited by userspace application
>>> accept.
>>>
>>> 2. Fallback the overflow connections limited by the heavy process of
>>> current SMC handshake. ( We can also control its behavior through
>>> sysctl.)
>>>
>>
>> I vote for (2) which makes the behavior from user space applications point of view more like TCP.
> Fallback when smc reaches itself limit is a good idea. I'm curious
> whether the fallback reason is suitable, it more like a non-negative
> issue. Currently, smc fallback for negative issues, such as resource not
> available or internal error. This issue doesn't like a non-negative
> reason.

SMC falls back when the SMC processing cannot be completed, e.g. due to 
resource constraints like memory. For me the time/duration constraint is
also a good reason to fall back to TCP.

> 
> And I have no idea about to mix the normal and fallback connections at
> same time, meanwhile there is no error happened or hard limit reaches,
> is a easy to maintain for users? Maybe let users misunderstanding, a
> parameter from userspace control this limit, and the behaviour (drop or
> fallback).

I think of the following approach: the default maximum of active workers in a
work queue is defined by WQ_MAX_ACTIVE (512). when this limit is hit then we
have slightly lesser than 512 parallel SMC handshakes running at the moment,
and new workers would be enqueued without to become active.
In that case (max active workers reached) I would tend to fallback new connections
to TCP. We would end up with lesser connections using SMC, but for the user space
applications there would be nearly no change compared to TCP (no dropped TCP connection
attempts, no need to reconnect).
Imho, most users will never run into this problem, so I think its fine to behave like this.

As far as I understand you, you still see a good reason in having another behavior 
implemented in parallel (controllable by user) which enqueues all incoming connections
like in your patch proposal? But how to deal with the out-of-memory problems that might 
happen with that?

>  
>> One comment to sysctl: our current approach is to add new switches to the existing 
>> netlink interface which can be used with the smc-tools package (or own implementations of course). 
>> Is this prereq problematic in your environment? 
>> We tried to avoid more sysctls and the netlink interface keeps use more flexible.
> 
> I agree with you about using netlink is more flexible. There are
> something different in our environment to use netlink to control the
> behaves of smc.
> 
> Compared with netlink, sysctl is:
> - easy to use on clusters. Applications who want to use smc, don't need
>   to deploy additional tools or developing another netlink logic,
>   especially for thousands of machines or containers. With smc forward,
>   we should make sure the package or logic is compatible with current
>   kernel, but sysctl's API compatible is easy to discover.
> 
> - config template and default maintain. We are using /etc/sysctl.conf to
>   make sure the systeml configures update to date, such as pre-tuned smc
>   config parameters. So that we can change this default values on boot,
>   and generate lots of machines base on this machine template. Userspace
>   netlink tools doesn't suit for it, for example ip related config, we
>   need additional NetworkManager or netctl to do this.
> 
> - TCP-like sysctl entries. TCP provides lots of sysctl to configure
>   itself, somethings it is hard to use and understand. However, it is
>   accepted by most of users and system. Maybe we could use sysctl for
>   the item that frequently and easy to change, netlink for the complex
>   item.
> 
> We are gold to contribute to smc-tools. Use netlink and sysctl both
> time, I think, is a more suitable choice.

Lets decide that when you have a specific control that you want to implement. 
I want to have a very good to introduce another interface into the SMC module,
making the code more complex and all of that. The decision for the netlink interface 
was also done because we have the impression that this is the NEW way to go, and
since we had no interface before we started with the most modern way to implement it.

TCP et al have a history with sysfs, so thats why it is still there. 
But I might be wrong on that...