All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagi@grimberg.me>
To: Mingbao Sun <sunmingbao@tom.com>, Keith Busch <kbusch@kernel.org>,
	Jens Axboe <axboe@fb.com>, Christoph Hellwig <hch@lst.de>,
	Chaitanya Kulkarni <kch@nvidia.com>,
	linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org,
	Eric Dumazet <edumazet@google.com>,
	"David S . Miller" <davem@davemloft.net>,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	David Ahern <dsahern@kernel.org>,
	Jakub Kicinski <kuba@kernel.org>,
	netdev@vger.kernel.org
Cc: tyler.sun@dell.com, ping.gan@dell.com, yanxiu.cai@dell.com,
	libin.zhang@dell.com, ao.sun@dell.com
Subject: Re: [PATCH v2 2/3] nvme-tcp: support specifying the congestion-control
Date: Sun, 13 Mar 2022 13:40:52 +0200	[thread overview]
Message-ID: <7121e4be-0e25-dd5f-9d29-0fb02cdbe8de@grimberg.me> (raw)
In-Reply-To: <20220311103414.8255-2-sunmingbao@tom.com>


> From: Mingbao Sun <tyler.sun@dell.com>

Hey Mingbao,

> congestion-control could have a noticeable impaction on the
> performance of TCP-based communications. This is of course true
> to NVMe_over_TCP.
> 
> Different congestion-controls (e.g., cubic, dctcp) are suitable for
> different scenarios. Proper adoption of congestion control would benefit
> the performance. On the contrary, the performance could be destroyed.
> 
> Though we can specify the congestion-control of NVMe_over_TCP via
> writing '/proc/sys/net/ipv4/tcp_congestion_control', but this also
> changes the congestion-control of all the future TCP sockets that
> have not been explicitly assigned the congestion-control, thus bringing
> potential impaction on their performance.
> 
> So it makes sense to make NVMe_over_TCP support specifying the
> congestion-control. And this commit addresses the host side.

Thanks for this patchset.

Generally, I'm not opposed to allow users to customize what they want
to do, but in order to add something like this we need a few
justifications.

1. Can you please provide your measurements that support your claims?

2. Can you please provide a real, existing use-case where this provides
true, measureable value? And more specifically, please clarify how the
use-case needs a local tuning for nvme-tcp that would not hold for
other tcp streams that are running on the host (and vice-versa).

3. There are quite a few of TCP tuning knobs that will affect how 
nvme-tcp performs, just like any TCP application that running on Linux.
However, Application level TCP tuning is not widespread at all, what
makes nvme-tcp special to allow this, and why the TCP congestion is more
important than other tuning knobs? I am not supportive of exporting
all or some TCP level knobs as a local shadow for sysctl.

Adding tunables, especially ones that are address niche use-cases in
nature, can easily become a slippery slope for a set of rarely touched
code and interface we are left stuck with for a long time...

But while this feels a bit random to me, I'm not objecting to add this 
to the driver. I just want to make sure that this is something that is
a) really required and b) does not backfire on us nor the user.

> Implementation approach:
> a new option called 'tcp_congestion' was created in fabrics opt_tokens
> for 'nvme connect' command to passed in the congestion-control
> specified by the user.
> Then later in nvme_tcp_alloc_queue, the specified congestion-control
> would be applied to the relevant sockets of the host side.

Specifically to the interface, I'm wandering if this is the right
interface... The user is used to sysctl with the semantics that it
provides, wouldn't it be better to expose the exact same interface
just for nvme-tcp sockets?

Something like sysctl nvme.tcp_congestion_control ?

  reply	other threads:[~2022-03-13 11:41 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-11 10:34 [PATCH v2 1/3] tcp: export symbol tcp_set_congestion_control Mingbao Sun
2022-03-11 10:34 ` [PATCH v2 2/3] nvme-tcp: support specifying the congestion-control Mingbao Sun
2022-03-13 11:40   ` Sagi Grimberg [this message]
2022-03-14  1:34     ` Mingbao Sun
2022-03-25 12:11     ` Mingbao Sun
2022-03-25 13:44       ` Sagi Grimberg
2022-03-29  2:48         ` Mingbao Sun
2022-03-29  4:33           ` Jakub Kicinski
2022-03-30  7:31             ` Mingbao Sun
2022-03-29  7:46           ` Sagi Grimberg
2022-03-30  7:57             ` Mingbao Sun
2022-03-30 10:27             ` Mingbao Sun
2022-03-31  3:26             ` Mingbao Sun
2022-03-31  5:33             ` Mingbao Sun
2022-04-05 16:48             ` John Meneghini
2022-04-05 16:50               ` John Meneghini
2022-03-25 12:44     ` Mingbao Sun
2022-03-25 14:11     ` Mingbao Sun
2022-03-25 14:46     ` Mingbao Sun
2022-03-14  7:19   ` Christoph Hellwig
2022-03-11 10:34 ` [PATCH v2 3/3] nvmet-tcp: " Mingbao Sun
2022-03-13 11:44   ` Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7121e4be-0e25-dd5f-9d29-0fb02cdbe8de@grimberg.me \
    --to=sagi@grimberg.me \
    --cc=ao.sun@dell.com \
    --cc=axboe@fb.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=kch@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=libin.zhang@dell.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=netdev@vger.kernel.org \
    --cc=ping.gan@dell.com \
    --cc=sunmingbao@tom.com \
    --cc=tyler.sun@dell.com \
    --cc=yanxiu.cai@dell.com \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.