linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC v1] nvme-tcp: enable linger socket option on shutdown
@ 2021-09-03 12:17 Daniel Wagner
  2021-09-06  7:58 ` Christoph Hellwig
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel Wagner @ 2021-09-03 12:17 UTC (permalink / raw)
  To: linux-nvme; +Cc: linux-kernel, Daniel Wagner

When the no linger is set, the networking stack sends FIN followed by
RST immediately when shutting down the socket. By enabling linger when
shutting down we have a proper shutdown sequence on the wire.

Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
The current shutdown sequence on the wire is a bit harsh and
doesn't let the remote host to react. I suppose we should
introduce a short (how long?) linger pause when shutting down
the connection. Thoughs?

 drivers/nvme/host/tcp.c | 1 +
 include/net/sock.h      | 1 +
 net/core/sock.c         | 8 ++++++++
 3 files changed, 10 insertions(+)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index e2ab12f3f51c..6c6dc465147a 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1558,6 +1558,7 @@ static void nvme_tcp_restore_sock_calls(struct nvme_tcp_queue *queue)
 
 static void __nvme_tcp_stop_queue(struct nvme_tcp_queue *queue)
 {
+	sock_reset_linger(queue->sock->sk);
 	kernel_sock_shutdown(queue->sock, SHUT_RDWR);
 	nvme_tcp_restore_sock_calls(queue);
 	cancel_work_sync(&queue->io_work);
diff --git a/include/net/sock.h b/include/net/sock.h
index 66a9a90f9558..313a6c8ba51c 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2777,6 +2777,7 @@ int sock_set_timestamping(struct sock *sk, int optname,
 
 void sock_enable_timestamps(struct sock *sk);
 void sock_no_linger(struct sock *sk);
+void sock_reset_linger(struct sock *sk);
 void sock_set_keepalive(struct sock *sk);
 void sock_set_priority(struct sock *sk, u32 priority);
 void sock_set_rcvbuf(struct sock *sk, int val);
diff --git a/net/core/sock.c b/net/core/sock.c
index 62627e868e03..23090a01e412 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -755,6 +755,14 @@ void sock_no_linger(struct sock *sk)
 }
 EXPORT_SYMBOL(sock_no_linger);
 
+void sock_reset_linger(struct sock *sk)
+{
+	lock_sock(sk);
+	sock_reset_flag(sk, SOCK_LINGER);
+	release_sock(sk);
+}
+EXPORT_SYMBOL_GPL(sock_reset_linger);
+
 void sock_set_priority(struct sock *sk, u32 priority)
 {
 	lock_sock(sk);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC v1] nvme-tcp: enable linger socket option on shutdown
  2021-09-03 12:17 [RFC v1] nvme-tcp: enable linger socket option on shutdown Daniel Wagner
@ 2021-09-06  7:58 ` Christoph Hellwig
  2021-09-14  8:46   ` Daniel Wagner
  0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2021-09-06  7:58 UTC (permalink / raw)
  To: Daniel Wagner; +Cc: linux-nvme, linux-kernel

On Fri, Sep 03, 2021 at 02:17:57PM +0200, Daniel Wagner wrote:
> When the no linger is set, the networking stack sends FIN followed by
> RST immediately when shutting down the socket. By enabling linger when
> shutting down we have a proper shutdown sequence on the wire.
> 
> Signed-off-by: Daniel Wagner <dwagner@suse.de>
> ---
> The current shutdown sequence on the wire is a bit harsh and
> doesn't let the remote host to react. I suppose we should
> introduce a short (how long?) linger pause when shutting down
> the connection. Thoughs?

Why?  I'm not really a TCP expert, but why is this different from
say iSCSI or NBD?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC v1] nvme-tcp: enable linger socket option on shutdown
  2021-09-06  7:58 ` Christoph Hellwig
@ 2021-09-14  8:46   ` Daniel Wagner
  2021-09-14 14:20     ` Sagi Grimberg
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel Wagner @ 2021-09-14  8:46 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-nvme, linux-kernel, netdev

On Mon, Sep 06, 2021 at 08:58:20AM +0100, Christoph Hellwig wrote:
> On Fri, Sep 03, 2021 at 02:17:57PM +0200, Daniel Wagner wrote:
> > When the no linger is set, the networking stack sends FIN followed by
> > RST immediately when shutting down the socket. By enabling linger when
> > shutting down we have a proper shutdown sequence on the wire.
> > 
> > Signed-off-by: Daniel Wagner <dwagner@suse.de>
> > ---
> > The current shutdown sequence on the wire is a bit harsh and
> > doesn't let the remote host to react. I suppose we should
> > introduce a short (how long?) linger pause when shutting down
> > the connection. Thoughs?
> 
> Why?  I'm not really a TCP expert, but why is this different from
> say iSCSI or NBD?

I am also no TCP expert. Adding netdev to Cc.

During testing the nvme-tcp subsystem by one of our partners we observed
this. Maybe this is perfectly fine. Just as I said it looks a bit weird
that a proper shutdown of the connection a RST is send out right after
the FIN.

No idea how iSCSI or NBD handles this. I'll check.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC v1] nvme-tcp: enable linger socket option on shutdown
  2021-09-14  8:46   ` Daniel Wagner
@ 2021-09-14 14:20     ` Sagi Grimberg
  2021-09-15  7:54       ` Daniel Wagner
  0 siblings, 1 reply; 5+ messages in thread
From: Sagi Grimberg @ 2021-09-14 14:20 UTC (permalink / raw)
  To: Daniel Wagner, Christoph Hellwig; +Cc: linux-nvme, linux-kernel, netdev


>>> When the no linger is set, the networking stack sends FIN followed by
>>> RST immediately when shutting down the socket. By enabling linger when
>>> shutting down we have a proper shutdown sequence on the wire.
>>>
>>> Signed-off-by: Daniel Wagner <dwagner@suse.de>
>>> ---
>>> The current shutdown sequence on the wire is a bit harsh and
>>> doesn't let the remote host to react. I suppose we should
>>> introduce a short (how long?) linger pause when shutting down
>>> the connection. Thoughs?
>>
>> Why?  I'm not really a TCP expert, but why is this different from
>> say iSCSI or NBD?
> 
> I am also no TCP expert. Adding netdev to Cc.
> 
> During testing the nvme-tcp subsystem by one of our partners we observed
> this. Maybe this is perfectly fine. Just as I said it looks a bit weird
> that a proper shutdown of the connection a RST is send out right after
> the FIN.

The point here is that when we close the connection we may have inflight
requests that we already failed to upper layers and we don't want them
to get through as we proceed to error handling. This is why we want the
socket to go away asap.

> No idea how iSCSI or NBD handles this. I'll check.

iSCSI does the same thing in essence (with a minor variation because in
iscsi we have a logout message which we don't have in nvme).

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC v1] nvme-tcp: enable linger socket option on shutdown
  2021-09-14 14:20     ` Sagi Grimberg
@ 2021-09-15  7:54       ` Daniel Wagner
  0 siblings, 0 replies; 5+ messages in thread
From: Daniel Wagner @ 2021-09-15  7:54 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Christoph Hellwig, linux-nvme, linux-kernel, netdev

On Tue, Sep 14, 2021 at 05:20:46PM +0300, Sagi Grimberg wrote:
> > During testing the nvme-tcp subsystem by one of our partners we observed
> > this. Maybe this is perfectly fine. Just as I said it looks a bit weird
> > that a proper shutdown of the connection a RST is send out right after
> > the FIN.
> 
> The point here is that when we close the connection we may have inflight
> requests that we already failed to upper layers and we don't want them
> to get through as we proceed to error handling. This is why we want the
> socket to go away asap.

Thanks for the explanation. The RST is on purpose, got it.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-09-15  7:55 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-03 12:17 [RFC v1] nvme-tcp: enable linger socket option on shutdown Daniel Wagner
2021-09-06  7:58 ` Christoph Hellwig
2021-09-14  8:46   ` Daniel Wagner
2021-09-14 14:20     ` Sagi Grimberg
2021-09-15  7:54       ` Daniel Wagner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).