All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next] rds-tcp: Add module parameters to control sndbuf/rcvbuf size of RDS-TCP socket
@ 2016-03-11 18:29 Sowmini Varadhan
  2016-03-11 19:09 ` Stephen Hemminger
  2016-03-11 19:14 ` David Miller
  0 siblings, 2 replies; 12+ messages in thread
From: Sowmini Varadhan @ 2016-03-11 18:29 UTC (permalink / raw)
  To: netdev, rds-devel; +Cc: sowmini.varadhan, santosh.shilimkar, davem


Some payload sizes/patterns stand to gain performance benefits by
tuning the size of the TCP socket buffers, so this commit adds
module parameters to customize those values when desired.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
 net/rds/tcp.c |   16 +++++++++++++++-
 1 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index ad60299..b59e7a2 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -52,7 +52,13 @@ static LIST_HEAD(rds_tcp_conn_list);
 
 static struct kmem_cache *rds_tcp_conn_slab;
 
-#define RDS_TCP_DEFAULT_BUFSIZE (128 * 1024)
+static int sndbuf_size = 16384;
+module_param(sndbuf_size, int, 0444);
+MODULE_PARM_DESC(sndbuf_size, "SO_SNDBUF size of kernel tcp socket");
+
+static int rcvbuf_size = 87380;
+module_param(rcvbuf_size, int, 0444);
+MODULE_PARM_DESC(rcvbuf_size, "SO_RCVBUF size of kernel tcp socket");
 
 /* doing it this way avoids calling tcp_sk() */
 void rds_tcp_nonagle(struct socket *sock)
@@ -72,7 +78,15 @@ void rds_tcp_nonagle(struct socket *sock)
  */
 void rds_tcp_tune(struct socket *sock)
 {
+	struct sock *sk = sock->sk;
+
 	rds_tcp_nonagle(sock);
+
+	lock_sock(sk);
+	sk->sk_sndbuf = sndbuf_size;
+	sk->sk_rcvbuf = rcvbuf_size;
+	sk->sk_userlocks |= SOCK_SNDBUF_LOCK | SOCK_RCVBUF_LOCK;
+	release_sock(sk);
 }
 
 u32 rds_tcp_snd_nxt(struct rds_tcp_connection *tc)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next] rds-tcp: Add module parameters to control sndbuf/rcvbuf size of RDS-TCP socket
  2016-03-11 18:29 [PATCH net-next] rds-tcp: Add module parameters to control sndbuf/rcvbuf size of RDS-TCP socket Sowmini Varadhan
@ 2016-03-11 19:09 ` Stephen Hemminger
  2016-03-11 19:12   ` Sowmini Varadhan
  2016-03-12  2:43   ` Sowmini Varadhan
  2016-03-11 19:14 ` David Miller
  1 sibling, 2 replies; 12+ messages in thread
From: Stephen Hemminger @ 2016-03-11 19:09 UTC (permalink / raw)
  To: Sowmini Varadhan; +Cc: netdev, rds-devel, santosh.shilimkar, davem

On Fri, 11 Mar 2016 13:29:49 -0500
Sowmini Varadhan <sowmini.varadhan@oracle.com> wrote:

>  
> -#define RDS_TCP_DEFAULT_BUFSIZE (128 * 1024)
> +static int sndbuf_size = 16384;
> +module_param(sndbuf_size, int, 0444);
> +MODULE_PARM_DESC(sndbuf_size, "SO_SNDBUF size of kernel tcp socket");
> +
> +static int rcvbuf_size = 87380;
> +module_param(rcvbuf_size, int, 0444);
> +MODULE_PARM_DESC(rcvbuf_size, "SO_RCVBUF size of kernel tcp socket");

Module parameters are a problem for distributions and should only be used
as a last resort.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next] rds-tcp: Add module parameters to control sndbuf/rcvbuf size of RDS-TCP socket
  2016-03-11 19:09 ` Stephen Hemminger
@ 2016-03-11 19:12   ` Sowmini Varadhan
  2016-03-12  2:43   ` Sowmini Varadhan
  1 sibling, 0 replies; 12+ messages in thread
From: Sowmini Varadhan @ 2016-03-11 19:12 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, rds-devel, santosh.shilimkar, davem

On (03/11/16 11:09), Stephen Hemminger wrote:
> 
> Module parameters are a problem for distributions and should only be used
> as a last resort.

I was not aware of that- out of curiosity, what is the associated problem?

What would be the alternative recommendation in this case?

--Sowmini

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next] rds-tcp: Add module parameters to control sndbuf/rcvbuf size of RDS-TCP socket
  2016-03-11 18:29 [PATCH net-next] rds-tcp: Add module parameters to control sndbuf/rcvbuf size of RDS-TCP socket Sowmini Varadhan
  2016-03-11 19:09 ` Stephen Hemminger
@ 2016-03-11 19:14 ` David Miller
  1 sibling, 0 replies; 12+ messages in thread
From: David Miller @ 2016-03-11 19:14 UTC (permalink / raw)
  To: sowmini.varadhan; +Cc: netdev, rds-devel, santosh.shilimkar

From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Date: Fri, 11 Mar 2016 13:29:49 -0500

> Some payload sizes/patterns stand to gain performance benefits by
> tuning the size of the TCP socket buffers, so this commit adds
> module parameters to customize those values when desired.
> 
> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>

No module parameters, please.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next] rds-tcp: Add module parameters to control sndbuf/rcvbuf size of RDS-TCP socket
  2016-03-11 19:09 ` Stephen Hemminger
  2016-03-11 19:12   ` Sowmini Varadhan
@ 2016-03-12  2:43   ` Sowmini Varadhan
  2016-03-12  3:21     ` Tom Herbert
  1 sibling, 1 reply; 12+ messages in thread
From: Sowmini Varadhan @ 2016-03-12  2:43 UTC (permalink / raw)
  To: Stephen Hemminger, davem; +Cc: netdev

On (03/11/16 11:09), Stephen Hemminger wrote:
> Module parameters are a problem for distributions and should only be used
> as a last resort.

I dont know the history of what the distibution problem is, but I
did try to use sysctl as an alternative for this. I'm starting to 
believe that this is one case where module params, with all their 
problems, are the least evil option. Here's what I find if I use sysctl:

- being able to tune the sndbuf and rcvbuf actually gives me a noticeable
  2X perf improvement over the default for DB/Cluster request/response
  transactions, where the classic req size is 8K bytes, response is 256
  bytes, and there are a large number of such concurrent transactions
  queued up on the kernel tcp socket. (The defaults work well for 
  larger packet sizes, but as I noted in the commit, packet sizes vary
  in actual deployment).

Assuming I use sysctl:

- by the time the admin gets to execute the sysctl, the kernel listen
  socket for RDS_TCP_PORT would already have been created, and an 
  arbitrary number of accept/connect (kernel) endpoints may have been
  created.  According to tcp(7), you should be setting the buf sizes before
  connect/listen. So using sysctl means you have to reset a variable
  number of existing cluster connections. All this can be done, but 
  adds a large mass of confusing code to reset kernel sockets and 
  also get the cluster HA/failover right.

- at first I thought sysctl was attractive because it was netns aware,
  but found that it is  only superficially so. The ->proc_handler does
  not pass in the struct net *, and setting up the ctl_table's ->data
  to a per-net var is not simple thing to do. Thus, even though
  register_net_sysctl() takes a net * pointer, my handler has to do
  extra ugly things to get to per-net vars.

I dont know if there is a better alternative than sysctl/module_param
for achieving what I'm trying to do, which is to set up the params
for the kernel sockets before creating them. If yes, some 
hints/rtfms would be helpful.

--Sowmini

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next] rds-tcp: Add module parameters to control sndbuf/rcvbuf size of RDS-TCP socket
  2016-03-12  2:43   ` Sowmini Varadhan
@ 2016-03-12  3:21     ` Tom Herbert
  2016-03-12  3:44       ` Sowmini Varadhan
  0 siblings, 1 reply; 12+ messages in thread
From: Tom Herbert @ 2016-03-12  3:21 UTC (permalink / raw)
  To: Sowmini Varadhan
  Cc: Stephen Hemminger, David S. Miller, Linux Kernel Network Developers

On Fri, Mar 11, 2016 at 6:43 PM, Sowmini Varadhan
<sowmini.varadhan@oracle.com> wrote:
> On (03/11/16 11:09), Stephen Hemminger wrote:
>> Module parameters are a problem for distributions and should only be used
>> as a last resort.
>
> I dont know the history of what the distibution problem is, but I
> did try to use sysctl as an alternative for this. I'm starting to
> believe that this is one case where module params, with all their
> problems, are the least evil option. Here's what I find if I use sysctl:
>
> - being able to tune the sndbuf and rcvbuf actually gives me a noticeable
>   2X perf improvement over the default for DB/Cluster request/response
>   transactions, where the classic req size is 8K bytes, response is 256
>   bytes, and there are a large number of such concurrent transactions
>   queued up on the kernel tcp socket. (The defaults work well for
>   larger packet sizes, but as I noted in the commit, packet sizes vary
>   in actual deployment).
>
> Assuming I use sysctl:
>
> - by the time the admin gets to execute the sysctl, the kernel listen
>   socket for RDS_TCP_PORT would already have been created, and an
>   arbitrary number of accept/connect (kernel) endpoints may have been
>   created.  According to tcp(7), you should be setting the buf sizes before
>   connect/listen. So using sysctl means you have to reset a variable
>   number of existing cluster connections. All this can be done, but
>   adds a large mass of confusing code to reset kernel sockets and
>   also get the cluster HA/failover right.
>
> - at first I thought sysctl was attractive because it was netns aware,
>   but found that it is  only superficially so. The ->proc_handler does
>   not pass in the struct net *, and setting up the ctl_table's ->data
>   to a per-net var is not simple thing to do. Thus, even though
>   register_net_sysctl() takes a net * pointer, my handler has to do
>   extra ugly things to get to per-net vars.
>
> I dont know if there is a better alternative than sysctl/module_param
> for achieving what I'm trying to do, which is to set up the params
> for the kernel sockets before creating them. If yes, some
> hints/rtfms would be helpful.
>
You could consider and alternate model where connection management
(connect, listen, etc.) is performed in a userspace daemon and the
attached to the kernel (pretty much like KCM does). This moves
complexity out of kernel, and gives you the flexibility to configure
arbitrarily configure TCP sockets (like the 4 setsockopts in
rds_tcp_keepalive could be configurable). The other cool thing this
would allow is switching an existing TCP connection into RDS mode
(like is done with several protocols on an http port).

Tom

> --Sowmini
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next] rds-tcp: Add module parameters to control sndbuf/rcvbuf size of RDS-TCP socket
  2016-03-12  3:21     ` Tom Herbert
@ 2016-03-12  3:44       ` Sowmini Varadhan
  2016-03-12  4:07         ` Tom Herbert
  0 siblings, 1 reply; 12+ messages in thread
From: Sowmini Varadhan @ 2016-03-12  3:44 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Stephen Hemminger, David S. Miller, Linux Kernel Network Developers

On (03/11/16 19:21), Tom Herbert wrote:
> You could consider and alternate model where connection management
> (connect, listen, etc.) is performed in a userspace daemon and the
> attached to the kernel (pretty much like KCM does). This moves

You have to remember that, like it or not, RDS has some IB legacy 
compat requirements that cannot be shed easily.  The above suggestion
suddenly requiring an extra daemon per node in a cluster
involving 100's of nodes with very complex DB setup procedures and
rigid HA requirements (nodes in the cluster try to reconnect
on connection failure).  An extra daemon, with its own startup requirements
and additional latency, would be hard to justify for something
that can otherwise be done with module_param.

As such, the above suggestion is even non-trivial than the "ugly" code
I referred to in my previous email.

It is actually far simpler to just tell the cluster-setup scripts to just
refrain from an ifup of the relevant interfaces till all the config 
is set up.

Besides, the basic problem remains: for an arbitrary kernel module 
that has parameters that need to be customized before module init,
what are the options today?

--Sowmini

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next] rds-tcp: Add module parameters to control sndbuf/rcvbuf size of RDS-TCP socket
  2016-03-12  3:44       ` Sowmini Varadhan
@ 2016-03-12  4:07         ` Tom Herbert
  2016-03-12  4:39           ` Sowmini Varadhan
  0 siblings, 1 reply; 12+ messages in thread
From: Tom Herbert @ 2016-03-12  4:07 UTC (permalink / raw)
  To: Sowmini Varadhan
  Cc: Stephen Hemminger, David S. Miller, Linux Kernel Network Developers

On Fri, Mar 11, 2016 at 7:44 PM, Sowmini Varadhan
<sowmini.varadhan@oracle.com> wrote:
> On (03/11/16 19:21), Tom Herbert wrote:
>> You could consider and alternate model where connection management
>> (connect, listen, etc.) is performed in a userspace daemon and the
>> attached to the kernel (pretty much like KCM does). This moves
>
> You have to remember that, like it or not, RDS has some IB legacy
> compat requirements that cannot be shed easily.  The above suggestion
> suddenly requiring an extra daemon per node in a cluster
> involving 100's of nodes with very complex DB setup procedures and
> rigid HA requirements (nodes in the cluster try to reconnect
> on connection failure).  An extra daemon, with its own startup requirements
> and additional latency, would be hard to justify for something
> that can otherwise be done with module_param.
>
You are describing your deployment of RDS, not kernel implementation.
What I see is a very rigid implementation that would make it hard for
many us to ever even consider deploying. The abilities of applications
to tune TCP connections is well understood, very prevalent, and really
fundamental in making TCP based datacenters at large scale. Any way,
it was just an idea... ;-)

> As such, the above suggestion is even non-trivial than the "ugly" code
> I referred to in my previous email.
>
> It is actually far simpler to just tell the cluster-setup scripts to just
> refrain from an ifup of the relevant interfaces till all the config
> is set up.
>
> Besides, the basic problem remains: for an arbitrary kernel module
> that has parameters that need to be customized before module init,
> what are the options today?
>
Maybe add one module parameter that indicates the module should just
load but not start, configure whatever is needed via netlink, and then
send one more netlink command to start operations.

> --Sowmini

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next] rds-tcp: Add module parameters to control sndbuf/rcvbuf size of RDS-TCP socket
  2016-03-12  4:07         ` Tom Herbert
@ 2016-03-12  4:39           ` Sowmini Varadhan
  2016-03-14 17:57             ` Tom Herbert
  0 siblings, 1 reply; 12+ messages in thread
From: Sowmini Varadhan @ 2016-03-12  4:39 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Stephen Hemminger, David S. Miller, Linux Kernel Network Developers

On (03/11/16 20:07), Tom Herbert wrote:

> You are describing your deployment of RDS, not kernel implementation.
> What I see is a very rigid implementation that would make it hard for
> many us to ever even consider deploying. The abilities of applications
> to tune TCP connections is well understood, very prevalent, and really
> fundamental in making TCP based datacenters at large scale.

sorry, historically, OS DDI/DKI's for kernel modules have always
had some way to set up startup parameters at module load time.  And 
clusters have been around for a while. 

So let's just focus on the technical question around module config here,
which involves more than TCP sockets, btw.

>  Any way, it was just an idea... ;-)

Thank you. 

Moving on,

> Maybe add one module parameter that indicates the module should just
> load but not start, configure whatever is needed via netlink, and then
> send one more netlink command to start operations.

Even that needs an extra daemon. 

Without getting into the vast number of questions that it raises (such
as every module with startup params now needs a uspace counterpart?
modprobe-r behavior, namespace behavior? Why netlink for every kernel
module? etc)..

One module parameter is as much a "distribution management"
problem as 10 of them, yes? I hope you see that I dont need that module
param and daemon baggage- I can just use  sysctl to set up all my params,
including one bit for module_can_start_now to achieve the same thing.

But it is still more than the handful of lines of code in my patch,
so it would be nice to understand what is the "distribution" issue.

Stepping back, how do we make sysctl fully namespace friendly? 

btw setting kernel socket keepalive params via sysctl is not a problem
to implement at all, if it ever shows up as a requirement for customers.

--Sowmini

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next] rds-tcp: Add module parameters to control sndbuf/rcvbuf size of RDS-TCP socket
  2016-03-12  4:39           ` Sowmini Varadhan
@ 2016-03-14 17:57             ` Tom Herbert
  2016-03-14 18:06               ` Sowmini Varadhan
  0 siblings, 1 reply; 12+ messages in thread
From: Tom Herbert @ 2016-03-14 17:57 UTC (permalink / raw)
  To: Sowmini Varadhan
  Cc: Stephen Hemminger, David S. Miller, Linux Kernel Network Developers

On Fri, Mar 11, 2016 at 8:39 PM, Sowmini Varadhan
<sowmini.varadhan@oracle.com> wrote:
> On (03/11/16 20:07), Tom Herbert wrote:
>
>> You are describing your deployment of RDS, not kernel implementation.
>> What I see is a very rigid implementation that would make it hard for
>> many us to ever even consider deploying. The abilities of applications
>> to tune TCP connections is well understood, very prevalent, and really
>> fundamental in making TCP based datacenters at large scale.
>
> sorry, historically, OS DDI/DKI's for kernel modules have always
> had some way to set up startup parameters at module load time.  And
> clusters have been around for a while.
>
> So let's just focus on the technical question around module config here,
> which involves more than TCP sockets, btw.
>
>>  Any way, it was just an idea... ;-)
>
> Thank you.
>
> Moving on,
>
>> Maybe add one module parameter that indicates the module should just
>> load but not start, configure whatever is needed via netlink, and then
>> send one more netlink command to start operations.
>
> Even that needs an extra daemon.
>
> Without getting into the vast number of questions that it raises (such
> as every module with startup params now needs a uspace counterpart?
> modprobe-r behavior, namespace behavior? Why netlink for every kernel
> module? etc)..
>
Most modules of any significant complexity are managed via netlink,
and so all of your questions have likely already been answered. Yes,
netlink assumes userspace configuration tools ("ip" configuration many
different networking modules). There are simple interfaces to
create/delete a module's netlink hooks when module is added/removed.
Netlink operates in the context of network namespaces. netlink is
preferred since it is far more extensible and generic for
configuration than sysctl, module params, etc.

Tom

> One module parameter is as much a "distribution management"
> problem as 10 of them, yes? I hope you see that I dont need that module
> param and daemon baggage- I can just use  sysctl to set up all my params,
> including one bit for module_can_start_now to achieve the same thing.
>
> But it is still more than the handful of lines of code in my patch,
> so it would be nice to understand what is the "distribution" issue.
>
> Stepping back, how do we make sysctl fully namespace friendly?
>
> btw setting kernel socket keepalive params via sysctl is not a problem
> to implement at all, if it ever shows up as a requirement for customers.
>
> --Sowmini
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next] rds-tcp: Add module parameters to control sndbuf/rcvbuf size of RDS-TCP socket
  2016-03-14 17:57             ` Tom Herbert
@ 2016-03-14 18:06               ` Sowmini Varadhan
  2016-03-14 18:59                 ` David Miller
  0 siblings, 1 reply; 12+ messages in thread
From: Sowmini Varadhan @ 2016-03-14 18:06 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Stephen Hemminger, David S. Miller, Linux Kernel Network Developers


In any case, to wrap up this thread.

I managed to set this up with sysctl. End result gives me a tunable
per netns (which modparam would not), and thanks to the RDS reconnect
infra, can be done dynamically at any time, not just at startup. 
And it is more compact than a daemon-y solution.

I'll send out the patches later this week after some more cleanup
and testing. 

--Sowmini

However, it would still be nice to know exactly what distribution
issues come out of modparam.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next] rds-tcp: Add module parameters to control sndbuf/rcvbuf size of RDS-TCP socket
  2016-03-14 18:06               ` Sowmini Varadhan
@ 2016-03-14 18:59                 ` David Miller
  0 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2016-03-14 18:59 UTC (permalink / raw)
  To: sowmini.varadhan; +Cc: tom, stephen, netdev

From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Date: Mon, 14 Mar 2016 14:06:42 -0400

> However, it would still be nice to know exactly what distribution
> issues come out of modparam.

Module parameters mean unstable interfaces, and provide inconsistent
ways to do the same exact thing from driver to driver and protocol to
protocol.

Do not use them, ever.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-03-14 18:59 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-11 18:29 [PATCH net-next] rds-tcp: Add module parameters to control sndbuf/rcvbuf size of RDS-TCP socket Sowmini Varadhan
2016-03-11 19:09 ` Stephen Hemminger
2016-03-11 19:12   ` Sowmini Varadhan
2016-03-12  2:43   ` Sowmini Varadhan
2016-03-12  3:21     ` Tom Herbert
2016-03-12  3:44       ` Sowmini Varadhan
2016-03-12  4:07         ` Tom Herbert
2016-03-12  4:39           ` Sowmini Varadhan
2016-03-14 17:57             ` Tom Herbert
2016-03-14 18:06               ` Sowmini Varadhan
2016-03-14 18:59                 ` David Miller
2016-03-11 19:14 ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.