All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mingbao Sun <sunmingbao@tom.com>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: Keith Busch <kbusch@kernel.org>, Jens Axboe <axboe@fb.com>,
	Christoph Hellwig <hch@lst.de>,
	Chaitanya Kulkarni <kch@nvidia.com>,
	linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org,
	Eric Dumazet <edumazet@google.com>,
	"David S . Miller" <davem@davemloft.net>,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	David Ahern <dsahern@kernel.org>,
	Jakub Kicinski <kuba@kernel.org>,
	netdev@vger.kernel.org, tyler.sun@dell.com, ping.gan@dell.com,
	yanxiu.cai@dell.com, libin.zhang@dell.com, ao.sun@dell.com
Subject: Re: [PATCH v2 2/3] nvme-tcp: support specifying the congestion-control
Date: Wed, 30 Mar 2022 15:57:39 +0800	[thread overview]
Message-ID: <20220330155739.00005a9d@tom.com> (raw)
In-Reply-To: <15f24dcd-9a62-8bab-271c-baa9cc693d8d@grimberg.me>

On Tue, 29 Mar 2022 10:46:08 +0300
Sagi Grimberg <sagi@grimberg.me> wrote:

> >> As I said, TCP can be tuned in various ways, congestion being just one
> >> of them. I'm sure you can find a workload where rmem/wmem will make
> >> a difference.  
> > 
> > agree.
> > but the difference for the knob of rmem/wmem is:
> > we could enlarge rmem/wmem for NVMe/TCP via sysctl,
> > and it would not bring downside to any other sockets whose
> > rmem/wmem are not explicitly specified.  
> 
> It can most certainly affect them, positively or negatively, depends
> on the use-case.

Agree.
Your saying is rigorous.

> >> In addition, based on my knowledge, application specific TCP level
> >> tuning (like congestion) is not really a common thing to do. So why in
> >> nvme-tcp?
> >>
> >> So to me at least, it is not clear why we should add it to the driver.  
> > 
> > As mentioned in the commit message, though we can specify the
> > congestion-control of NVMe_over_TCP via sysctl or writing
> > '/proc/sys/net/ipv4/tcp_congestion_control', but this also
> > changes the congestion-control of all the future TCP sockets on
> > the same host that have not been explicitly assigned the
> > congestion-control, thus bringing potential impaction on their
> > performance.
> > 
> > For example:
> > 
> > A server in a data-center with the following 2 NICs:
> > 
> >      - NIC_fron-end, for interacting with clients through WAN
> >        (high latency, ms-level)
> > 
> >      - NIC_back-end, for interacting with NVMe/TCP target through LAN
> >        (low latency, ECN-enabled, ideal for dctcp)
> > 
> > This server interacts with clients (handling requests) via the fron-end
> > network and accesses the NVMe/TCP storage via the back-end network.
> > This is a normal use case, right?
> > 
> > For the client devices, we can’t determine their congestion-control.
> > But normally it’s cubic by default (per the CONFIG_DEFAULT_TCP_CONG).
> > So if we change the default congestion control on the server to dctcp
> > on behalf of the NVMe/TCP traffic of the LAN side, it could at the
> > same time change the congestion-control of the front-end sockets
> > to dctcp while the congestion-control of the client-side is cubic.
> > So this is an unexpected scenario.
> > 
> > In addition, distributed storage products like the following also have
> > the above problem:
> > 
> >      - The product consists of a cluster of servers.
> > 
> >      - Each server serves clients via its front-end NIC
> >       (WAN, high latency).
> > 
> >      - All servers interact with each other via NVMe/TCP via back-end NIC
> >       (LAN, low latency, ECN-enabled, ideal for dctcp).  
> 
> Separate networks are still not application (nvme-tcp) specific and as
> mentioned, we have a way to control that. IMO, this still does not
> qualify as solid justification to add this to nvme-tcp.
> 
> What do others think?

Well, per the fact that the approach (‘ip route …’) proposed
by Jakub could largely fit the per link requirement on
congestion-control, so the usefulness of this patchset is really
not so significant.

So here I terminate all the threads of this patchset.

At last, many thanks to all of you for reviewing this patchset.

  reply	other threads:[~2022-03-31  4:14 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-11 10:34 [PATCH v2 1/3] tcp: export symbol tcp_set_congestion_control Mingbao Sun
2022-03-11 10:34 ` [PATCH v2 2/3] nvme-tcp: support specifying the congestion-control Mingbao Sun
2022-03-13 11:40   ` Sagi Grimberg
2022-03-14  1:34     ` Mingbao Sun
2022-03-25 12:11     ` Mingbao Sun
2022-03-25 13:44       ` Sagi Grimberg
2022-03-29  2:48         ` Mingbao Sun
2022-03-29  4:33           ` Jakub Kicinski
2022-03-30  7:31             ` Mingbao Sun
2022-03-29  7:46           ` Sagi Grimberg
2022-03-30  7:57             ` Mingbao Sun [this message]
2022-03-30 10:27             ` Mingbao Sun
2022-03-31  3:26             ` Mingbao Sun
2022-03-31  5:33             ` Mingbao Sun
2022-04-05 16:48             ` John Meneghini
2022-04-05 16:50               ` John Meneghini
2022-03-25 12:44     ` Mingbao Sun
2022-03-25 14:11     ` Mingbao Sun
2022-03-25 14:46     ` Mingbao Sun
2022-03-14  7:19   ` Christoph Hellwig
2022-03-11 10:34 ` [PATCH v2 3/3] nvmet-tcp: " Mingbao Sun
2022-03-13 11:44   ` Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220330155739.00005a9d@tom.com \
    --to=sunmingbao@tom.com \
    --cc=ao.sun@dell.com \
    --cc=axboe@fb.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=kch@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=libin.zhang@dell.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=netdev@vger.kernel.org \
    --cc=ping.gan@dell.com \
    --cc=sagi@grimberg.me \
    --cc=tyler.sun@dell.com \
    --cc=yanxiu.cai@dell.com \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.