All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Meneghini <jmeneghi@redhat.com>
To: linux-nvme@lists.infradead.org, Sagi Grimberg <sagi@grimberg.me>
Subject: Re: [PATCH v2 2/3] nvme-tcp: support specifying the congestion-control
Date: Tue, 5 Apr 2022 12:50:24 -0400	[thread overview]
Message-ID: <9b45bd0a-872c-7fe2-09b1-1bb54aeef2f2@redhat.com> (raw)
In-Reply-To: <9be1e68c-00aa-3547-9cb5-b3ca302e209b@redhat.com>

If you want things to slow down with NVMe, use the protocol's built in flow control mechanism: SQ flow control. This will keep 
the commands out of the transport queue and avoid the possibility of unwanted or unexpected command timeouts.

But this is another topic for discussion.

/John

On 4/5/22 12:48, John Meneghini wrote:
> 
> On 3/29/22 03:46, Sagi Grimberg wrote:
>>> In addition, distributed storage products like the following also have
>>> the above problem:
>>>
>>>      - The product consists of a cluster of servers.
>>>
>>>      - Each server serves clients via its front-end NIC
>>>       (WAN, high latency).
>>>
>>>      - All servers interact with each other via NVMe/TCP via back-end NIC
>>>       (LAN, low latency, ECN-enabled, ideal for dctcp).
>>
>> Separate networks are still not application (nvme-tcp) specific and as
>> mentioned, we have a way to control that. IMO, this still does not
>> qualify as solid justification to add this to nvme-tcp.
>>
>> What do others think?
> 
> OK. I'll bite.
> 
> In my experience adding any type of QOS control a Storage Area Network causes problems because it increases the likelihood of 
> ULP timeouts (command timeouts).
> 
> NAS protocols like NFS and CIFs have built in assumptions about latency. They have long timeouts at the session layer and they 
> trade latency for reliable delivery.  SAN protocols like iSCSI and NVMe/TCP make no such trade off. All block protocols have 
> much shorter per-command timeouts and they expect reliable delivery. These timeouts are much shorter and doing anything to the 
> TCP connection which could increase latency runs the risk of causing the side effect of command timeouts.  In NVMe we also have 
> the Keep alive timeout which could be affected by TCP latency. It's for this reason that most SANs are deployed on LANs not 
> WANs. It's also for this reason that most Cluster monitor mechanisms (components that maintain cluster wide membership through 
> heat beats) use UDP not TCP.
> 
> With NVMe/TCP we want the connection layer to go as fast as possible and I agree with Sagi that adding any kind of QOS mechanism 
> to the transport is not desirable.
> 
> /John
> 



  reply	other threads:[~2022-04-05 16:50 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-11 10:34 [PATCH v2 1/3] tcp: export symbol tcp_set_congestion_control Mingbao Sun
2022-03-11 10:34 ` [PATCH v2 2/3] nvme-tcp: support specifying the congestion-control Mingbao Sun
2022-03-13 11:40   ` Sagi Grimberg
2022-03-14  1:34     ` Mingbao Sun
2022-03-25 12:11     ` Mingbao Sun
2022-03-25 13:44       ` Sagi Grimberg
2022-03-29  2:48         ` Mingbao Sun
2022-03-29  4:33           ` Jakub Kicinski
2022-03-30  7:31             ` Mingbao Sun
2022-03-29  7:46           ` Sagi Grimberg
2022-03-30  7:57             ` Mingbao Sun
2022-03-30 10:27             ` Mingbao Sun
2022-03-31  3:26             ` Mingbao Sun
2022-03-31  5:33             ` Mingbao Sun
2022-04-05 16:48             ` John Meneghini
2022-04-05 16:50               ` John Meneghini [this message]
2022-03-25 12:44     ` Mingbao Sun
2022-03-25 14:11     ` Mingbao Sun
2022-03-25 14:46     ` Mingbao Sun
2022-03-14  7:19   ` Christoph Hellwig
2022-03-11 10:34 ` [PATCH v2 3/3] nvmet-tcp: " Mingbao Sun
2022-03-13 11:44   ` Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9b45bd0a-872c-7fe2-09b1-1bb54aeef2f2@redhat.com \
    --to=jmeneghi@redhat.com \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.