All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mingbao Sun <sunmingbao@tom.com>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: Keith Busch <kbusch@kernel.org>, Jens Axboe <axboe@fb.com>,
	Christoph Hellwig <hch@lst.de>,
	Chaitanya Kulkarni <kch@nvidia.com>,
	linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org,
	Eric Dumazet <edumazet@google.com>,
	"David S . Miller" <davem@davemloft.net>,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	David Ahern <dsahern@kernel.org>,
	Jakub Kicinski <kuba@kernel.org>,
	netdev@vger.kernel.org, tyler.sun@dell.com, ping.gan@dell.com,
	yanxiu.cai@dell.com, libin.zhang@dell.com, ao.sun@dell.com
Subject: Re: [PATCH v2 2/3] nvme-tcp: support specifying the congestion-control
Date: Tue, 29 Mar 2022 10:48:06 +0800	[thread overview]
Message-ID: <20220329104806.00000126@tom.com> (raw)
In-Reply-To: <b7b5106a-9c0d-db49-00ab-234756955de8@grimberg.me>

> As I said, TCP can be tuned in various ways, congestion being just one
> of them. I'm sure you can find a workload where rmem/wmem will make
> a difference.

agree.
but the difference for the knob of rmem/wmem is:
we could enlarge rmem/wmem for NVMe/TCP via sysctl,
and it would not bring downside to any other sockets whose
rmem/wmem are not explicitly specified.

> In addition, based on my knowledge, application specific TCP level
> tuning (like congestion) is not really a common thing to do. So why in
> nvme-tcp?
> 
> So to me at least, it is not clear why we should add it to the driver.

As mentioned in the commit message, though we can specify the
congestion-control of NVMe_over_TCP via sysctl or writing
'/proc/sys/net/ipv4/tcp_congestion_control', but this also
changes the congestion-control of all the future TCP sockets on
the same host that have not been explicitly assigned the
congestion-control, thus bringing potential impaction on their
performance.

For example:

A server in a data-center with the following 2 NICs:

    - NIC_fron-end, for interacting with clients through WAN
      (high latency, ms-level)

    - NIC_back-end, for interacting with NVMe/TCP target through LAN
      (low latency, ECN-enabled, ideal for dctcp)

This server interacts with clients (handling requests) via the fron-end
network and accesses the NVMe/TCP storage via the back-end network.
This is a normal use case, right?

For the client devices, we can’t determine their congestion-control.
But normally it’s cubic by default (per the CONFIG_DEFAULT_TCP_CONG).
So if we change the default congestion control on the server to dctcp
on behalf of the NVMe/TCP traffic of the LAN side, it could at the
same time change the congestion-control of the front-end sockets
to dctcp while the congestion-control of the client-side is cubic.
So this is an unexpected scenario.

In addition, distributed storage products like the following also have
the above problem:

    - The product consists of a cluster of servers.

    - Each server serves clients via its front-end NIC
     (WAN, high latency).

    - All servers interact with each other via NVMe/TCP via back-end NIC
     (LAN, low latency, ECN-enabled, ideal for dctcp).

  reply	other threads:[~2022-03-29  2:48 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-11 10:34 [PATCH v2 1/3] tcp: export symbol tcp_set_congestion_control Mingbao Sun
2022-03-11 10:34 ` [PATCH v2 2/3] nvme-tcp: support specifying the congestion-control Mingbao Sun
2022-03-13 11:40   ` Sagi Grimberg
2022-03-14  1:34     ` Mingbao Sun
2022-03-25 12:11     ` Mingbao Sun
2022-03-25 13:44       ` Sagi Grimberg
2022-03-29  2:48         ` Mingbao Sun [this message]
2022-03-29  4:33           ` Jakub Kicinski
2022-03-30  7:31             ` Mingbao Sun
2022-03-29  7:46           ` Sagi Grimberg
2022-03-30  7:57             ` Mingbao Sun
2022-03-30 10:27             ` Mingbao Sun
2022-03-31  3:26             ` Mingbao Sun
2022-03-31  5:33             ` Mingbao Sun
2022-04-05 16:48             ` John Meneghini
2022-04-05 16:50               ` John Meneghini
2022-03-25 12:44     ` Mingbao Sun
2022-03-25 14:11     ` Mingbao Sun
2022-03-25 14:46     ` Mingbao Sun
2022-03-14  7:19   ` Christoph Hellwig
2022-03-11 10:34 ` [PATCH v2 3/3] nvmet-tcp: " Mingbao Sun
2022-03-13 11:44   ` Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220329104806.00000126@tom.com \
    --to=sunmingbao@tom.com \
    --cc=ao.sun@dell.com \
    --cc=axboe@fb.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=kch@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=libin.zhang@dell.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=netdev@vger.kernel.org \
    --cc=ping.gan@dell.com \
    --cc=sagi@grimberg.me \
    --cc=tyler.sun@dell.com \
    --cc=yanxiu.cai@dell.com \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.