linux-cifs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] cifs: detect dead connections only when echoes are enabled.
@ 2021-04-29  8:12 Shyam Prasad N
  2021-04-29 11:46 ` Aurélien Aptel
  0 siblings, 1 reply; 7+ messages in thread
From: Shyam Prasad N @ 2021-04-29  8:12 UTC (permalink / raw)
  To: Steve French, CIFS, Pavel Shilovsky, Aurélien Aptel

[-- Attachment #1: Type: text/plain, Size: 778 bytes --]

Hi,

Recently, some xfstests and later some manual testing on multi-channel
revealed that we detect unresponsive servers on some of the channels
to the same server.

The issue is seen when a channel is setup and sits idle without any
traffic. Generally, we enable echoes and oplocks on a channel during
the first request, based on the number of credits available on the
channel. So on idle channels, we trip in our logic to check server
unresponsiveness.

Attached a one-line fix for this. Have tested it in my environment.
Another approach to fix this could be to enable echoes during
initialization of a server struct. Or soon after the session setup.
But I felt that this approach is better. Let me know if you feel
otherwise.

Please review and comment.

-- 
Regards,
Shyam

[-- Attachment #2: 0001-cifs-detect-dead-connections-only-when-echoes-are-en.patch --]
[-- Type: application/octet-stream, Size: 1286 bytes --]

From dc604e09eadf0603e2f3ddc7eeee0f81e8d5652c Mon Sep 17 00:00:00 2001
From: Shyam Prasad N <sprasad@microsoft.com>
Date: Thu, 29 Apr 2021 07:53:18 +0000
Subject: [PATCH] cifs: detect dead connections only when echoes are enabled.

We can detect server unresponsiveness only if echoes are enabled.
Echoes can be disabled under two scenarios:
1. The connection is low on credits, so we've disabled echoes/oplocks.
2. The connection has not seen any request till now (other than
negotiate/sess-setup), which is when we enable these two, based on
the credits available.

So this fix will check for dead connection, only when echo is enabled.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
---
 fs/cifs/connect.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 121d8b4535b0..d430e57e74bc 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -476,6 +476,7 @@ server_unresponsive(struct TCP_Server_Info *server)
 	 */
 	if ((server->tcpStatus == CifsGood ||
 	    server->tcpStatus == CifsNeedNegotiate) &&
+		server->echoes &&
 	    time_after(jiffies, server->lstrp + 3 * server->echo_interval)) {
 		cifs_server_dbg(VFS, "has not responded in %lu seconds. Reconnecting...\n",
 			 (3 * server->echo_interval) / HZ);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] cifs: detect dead connections only when echoes are enabled.
  2021-04-29  8:12 [PATCH] cifs: detect dead connections only when echoes are enabled Shyam Prasad N
@ 2021-04-29 11:46 ` Aurélien Aptel
  2021-04-29 12:29   ` Shyam Prasad N
  0 siblings, 1 reply; 7+ messages in thread
From: Aurélien Aptel @ 2021-04-29 11:46 UTC (permalink / raw)
  To: Shyam Prasad N, Steve French, CIFS, Pavel Shilovsky


Hi Shyam,

Ok so I ended up looking at a lot of code to check this... And I'm still
unsure.

First, it looks like server->echoes is only used for smb2 and there is
a generic server->ops->can_echo() you can use that just returns
server->echoes it for smb2.

Shyam Prasad N <nspmangalore@gmail.com> writes:
> Hi,
>
> Recently, some xfstests and later some manual testing on multi-channel
> revealed that we detect unresponsive servers on some of the channels
> to the same server.
>
> The issue is seen when a channel is setup and sits idle without any
> traffic. Generally, we enable echoes and oplocks on a channel during
> the first request, based on the number of credits available on the
> channel. So on idle channels, we trip in our logic to check server
> unresponsiveness.

That makes sense but while looking at the code I see we always queue
echo request in cifs_get_tcp_session(), which is called when adding a
channel.

cifs_ses_add_channel() {
	ctx.echo_interval = ses->server->echo_interval / HZ;

	chan->server = cifs_get_tcp_session(&ctx);

	rc = cifs_negotiate_protocol(xid, ses) {
		server->tcpStatus = CifsGood;
	}

	rc = cifs_setup_session(xid, ses, cifs_sb->local_nls);
}

cifs_get_tcp_session() {

	INIT_DELAYED_WORK(&tcp_ses->echo, cifs_echo_request);

	tcp_ses->tcpStatus = CifsNeedNegotiate;

	tcp_ses->lstrp = jiffies;

	if (ctx->echo_interval >= SMB_ECHO_INTERVAL_MIN &&
		ctx->echo_interval <= SMB_ECHO_INTERVAL_MAX)
		tcp_ses->echo_interval = ctx->echo_interval * HZ;
	else
		tcp_ses->echo_interval = SMB_ECHO_INTERVAL_DEFAULT * HZ;

	queue_delayed_work(cifsiod_wq, &tcp_ses->echo, tcp_ses->echo_interval);
}

cifs_echo_request() {

	if (server->tcpStatus == CifsNeedNegotiate)
		echo_interval = 0; <=== branch taken
	else
		echo_interval = server->echo_interval;

	if (server->tcpStatus not in {NeedReconnect, Exiting, New}
	   && server->can_echo()
	   && jiffies > server->lstrp + echo_interval - HZ)
	{
		server->echo();
	}

	queue_delayed_work(cifsiod_wq, &server->echo, server->echo_interval);
        ===> echo_interval = 0 so calls itself immediatly
}

SMB2_echo() {
	if (server->tcpStatus == CifsNeedNegotiate) {
		/* No need to send echo on newly established connections */
		mod_delayed_work(cifsiod_wq, &server->reconnect, 0);
                ====> calls smb2_reconnect_server() immediatly if NeedNego
		return rc;
	}

}

smb2_reconnect_server() {
	// channel has no TCONs so it does essentially nothing
}

server_unresponsive() {
	if (server->tcpStatus in {Good, NeedNegotiate}
           && jiffies > server->lstrp + 3 * server->echo_interval)
        {
		cifs_server_dbg(VFS, "has not responded in %lu seconds. Reconnecting...\n",
			 (3 * server->echo_interval) / HZ);
		cifs_reconnect(server);
		return true;
	}
        return false;
}

So it looks to me that cifs_get_tcp_session() starts the
cifs_echo_request() work, which calls itself with no delay in an
infinite loop doing nothing (that's probably bad...) until session_setup
succeeds, after which the delay between the self-call is set.

During session_setup():

* last response time (lstrp) gets updated

* sending/recv requests should interact with the server
  credits and call change_conf(), which should enable server->echoes
  ==> is that part not working?

Once enabled, the echo_request workq will finally send the echo on the
wire, which should -/+ 1 credit and update lstrp.

> Attached a one-line fix for this. Have tested it in my environment.
> Another approach to fix this could be to enable echoes during
> initialization of a server struct. Or soon after the session setup.
> But I felt that this approach is better. Let me know if you feel
> otherwise.

I think the idea of your change is ok but there's probably also an issue
in crediting in session_setup()/change_conf() if echoes is not enabled
at this point no?

Cheers,
-- 
Aurélien Aptel / SUSE Labs Samba Team
GPG: 1839 CB5F 9F5B FB9B AA97  8C99 03C8 A49B 521B D5D3
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, DE
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 247165 (AG München)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] cifs: detect dead connections only when echoes are enabled.
  2021-04-29 11:46 ` Aurélien Aptel
@ 2021-04-29 12:29   ` Shyam Prasad N
  2021-04-29 15:47     ` Shyam Prasad N
  0 siblings, 1 reply; 7+ messages in thread
From: Shyam Prasad N @ 2021-04-29 12:29 UTC (permalink / raw)
  To: Aurélien Aptel; +Cc: Steve French, CIFS, Pavel Shilovsky

Hi Aurélien,

Replies below...

On Thu, Apr 29, 2021 at 5:16 PM Aurélien Aptel <aaptel@suse.com> wrote:
>
>
> Hi Shyam,
>
> Ok so I ended up looking at a lot of code to check this... And I'm still
> unsure.
>
> First, it looks like server->echoes is only used for smb2 and there is
> a generic server->ops->can_echo() you can use that just returns
> server->echoes it for smb2.
Agree. I can use can_echo() instead.

>
> Shyam Prasad N <nspmangalore@gmail.com> writes:
> > Hi,
> >
> > Recently, some xfstests and later some manual testing on multi-channel
> > revealed that we detect unresponsive servers on some of the channels
> > to the same server.
> >
> > The issue is seen when a channel is setup and sits idle without any
> > traffic. Generally, we enable echoes and oplocks on a channel during
> > the first request, based on the number of credits available on the
> > channel. So on idle channels, we trip in our logic to check server
> > unresponsiveness.
>
> That makes sense but while looking at the code I see we always queue
> echo request in cifs_get_tcp_session(), which is called when adding a
> channel.
>
> cifs_ses_add_channel() {
>         ctx.echo_interval = ses->server->echo_interval / HZ;
>
>         chan->server = cifs_get_tcp_session(&ctx);
>
>         rc = cifs_negotiate_protocol(xid, ses) {
>                 server->tcpStatus = CifsGood;
>         }
>
>         rc = cifs_setup_session(xid, ses, cifs_sb->local_nls);
> }
>
> cifs_get_tcp_session() {
>
>         INIT_DELAYED_WORK(&tcp_ses->echo, cifs_echo_request);
>
>         tcp_ses->tcpStatus = CifsNeedNegotiate;
>
>         tcp_ses->lstrp = jiffies;
>
>         if (ctx->echo_interval >= SMB_ECHO_INTERVAL_MIN &&
>                 ctx->echo_interval <= SMB_ECHO_INTERVAL_MAX)
>                 tcp_ses->echo_interval = ctx->echo_interval * HZ;
>         else
>                 tcp_ses->echo_interval = SMB_ECHO_INTERVAL_DEFAULT * HZ;
>
>         queue_delayed_work(cifsiod_wq, &tcp_ses->echo, tcp_ses->echo_interval);
> }
>
> cifs_echo_request() {
>
>         if (server->tcpStatus == CifsNeedNegotiate)
>                 echo_interval = 0; <=== branch taken
>         else
>                 echo_interval = server->echo_interval;
>
>         if (server->tcpStatus not in {NeedReconnect, Exiting, New}
>            && server->can_echo()
>            && jiffies > server->lstrp + echo_interval - HZ)
>         {
>                 server->echo();
>         }
>
>         queue_delayed_work(cifsiod_wq, &server->echo, server->echo_interval);
>         ===> echo_interval = 0 so calls itself immediatly
> }
>
> SMB2_echo() {
>         if (server->tcpStatus == CifsNeedNegotiate) {
>                 /* No need to send echo on newly established connections */
>                 mod_delayed_work(cifsiod_wq, &server->reconnect, 0);
>                 ====> calls smb2_reconnect_server() immediatly if NeedNego
>                 return rc;
>         }
>
> }
>
> smb2_reconnect_server() {
>         // channel has no TCONs so it does essentially nothing
> }
>
> server_unresponsive() {
>         if (server->tcpStatus in {Good, NeedNegotiate}
>            && jiffies > server->lstrp + 3 * server->echo_interval)
>         {
>                 cifs_server_dbg(VFS, "has not responded in %lu seconds. Reconnecting...\n",
>                          (3 * server->echo_interval) / HZ);
>                 cifs_reconnect(server);
>                 return true;
>         }
>         return false;
> }
>
> So it looks to me that cifs_get_tcp_session() starts the
> cifs_echo_request() work, which calls itself with no delay in an
> infinite loop doing nothing (that's probably bad...) until session_setup
> succeeds, after which the delay between the self-call is set.

Perhaps we think that 1 sec is too much for us to complete negotiate?
But echo_interval of 0 looks bad.

Ideally, we should queue echo work only when session setup is complete.
What do you think?

>
> During session_setup():
>
> * last response time (lstrp) gets updated
>
> * sending/recv requests should interact with the server
>   credits and call change_conf(), which should enable server->echoes
>   ==> is that part not working?
It's working. But since these channels are idle, change_conf is not
called for these channels.
So echoes=false till there's some activity on the channel.

>
> Once enabled, the echo_request workq will finally send the echo on the
> wire, which should -/+ 1 credit and update lstrp.
>
> > Attached a one-line fix for this. Have tested it in my environment.
> > Another approach to fix this could be to enable echoes during
> > initialization of a server struct. Or soon after the session setup.
> > But I felt that this approach is better. Let me know if you feel
> > otherwise.
>
> I think the idea of your change is ok but there's probably also an issue
> in crediting in session_setup()/change_conf() if echoes is not enabled
> at this point no?
>
> Cheers,
> --
> Aurélien Aptel / SUSE Labs Samba Team
> GPG: 1839 CB5F 9F5B FB9B AA97  8C99 03C8 A49B 521B D5D3
> SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, DE
> GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 247165 (AG München)
>


-- 
Regards,
Shyam

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] cifs: detect dead connections only when echoes are enabled.
  2021-04-29 12:29   ` Shyam Prasad N
@ 2021-04-29 15:47     ` Shyam Prasad N
  2021-04-29 17:37       ` Steve French
  2021-04-29 18:52       ` Steve French
  0 siblings, 2 replies; 7+ messages in thread
From: Shyam Prasad N @ 2021-04-29 15:47 UTC (permalink / raw)
  To: Aurélien Aptel; +Cc: Steve French, CIFS, Pavel Shilovsky

[-- Attachment #1: Type: text/plain, Size: 5939 bytes --]

Attaching updated patch with can_echo() instead of referencing echoes directly.
I'll submit another patch for the other changes.

Regards,
Shyam

On Thu, Apr 29, 2021 at 5:59 PM Shyam Prasad N <nspmangalore@gmail.com> wrote:
>
> Hi Aurélien,
>
> Replies below...
>
> On Thu, Apr 29, 2021 at 5:16 PM Aurélien Aptel <aaptel@suse.com> wrote:
> >
> >
> > Hi Shyam,
> >
> > Ok so I ended up looking at a lot of code to check this... And I'm still
> > unsure.
> >
> > First, it looks like server->echoes is only used for smb2 and there is
> > a generic server->ops->can_echo() you can use that just returns
> > server->echoes it for smb2.
> Agree. I can use can_echo() instead.
>
> >
> > Shyam Prasad N <nspmangalore@gmail.com> writes:
> > > Hi,
> > >
> > > Recently, some xfstests and later some manual testing on multi-channel
> > > revealed that we detect unresponsive servers on some of the channels
> > > to the same server.
> > >
> > > The issue is seen when a channel is setup and sits idle without any
> > > traffic. Generally, we enable echoes and oplocks on a channel during
> > > the first request, based on the number of credits available on the
> > > channel. So on idle channels, we trip in our logic to check server
> > > unresponsiveness.
> >
> > That makes sense but while looking at the code I see we always queue
> > echo request in cifs_get_tcp_session(), which is called when adding a
> > channel.
> >
> > cifs_ses_add_channel() {
> >         ctx.echo_interval = ses->server->echo_interval / HZ;
> >
> >         chan->server = cifs_get_tcp_session(&ctx);
> >
> >         rc = cifs_negotiate_protocol(xid, ses) {
> >                 server->tcpStatus = CifsGood;
> >         }
> >
> >         rc = cifs_setup_session(xid, ses, cifs_sb->local_nls);
> > }
> >
> > cifs_get_tcp_session() {
> >
> >         INIT_DELAYED_WORK(&tcp_ses->echo, cifs_echo_request);
> >
> >         tcp_ses->tcpStatus = CifsNeedNegotiate;
> >
> >         tcp_ses->lstrp = jiffies;
> >
> >         if (ctx->echo_interval >= SMB_ECHO_INTERVAL_MIN &&
> >                 ctx->echo_interval <= SMB_ECHO_INTERVAL_MAX)
> >                 tcp_ses->echo_interval = ctx->echo_interval * HZ;
> >         else
> >                 tcp_ses->echo_interval = SMB_ECHO_INTERVAL_DEFAULT * HZ;
> >
> >         queue_delayed_work(cifsiod_wq, &tcp_ses->echo, tcp_ses->echo_interval);
> > }
> >
> > cifs_echo_request() {
> >
> >         if (server->tcpStatus == CifsNeedNegotiate)
> >                 echo_interval = 0; <=== branch taken
> >         else
> >                 echo_interval = server->echo_interval;
> >
> >         if (server->tcpStatus not in {NeedReconnect, Exiting, New}
> >            && server->can_echo()
> >            && jiffies > server->lstrp + echo_interval - HZ)
> >         {
> >                 server->echo();
> >         }
> >
> >         queue_delayed_work(cifsiod_wq, &server->echo, server->echo_interval);
> >         ===> echo_interval = 0 so calls itself immediatly
> > }
> >
> > SMB2_echo() {
> >         if (server->tcpStatus == CifsNeedNegotiate) {
> >                 /* No need to send echo on newly established connections */
> >                 mod_delayed_work(cifsiod_wq, &server->reconnect, 0);
> >                 ====> calls smb2_reconnect_server() immediatly if NeedNego
> >                 return rc;
> >         }
> >
> > }
> >
> > smb2_reconnect_server() {
> >         // channel has no TCONs so it does essentially nothing
> > }
> >
> > server_unresponsive() {
> >         if (server->tcpStatus in {Good, NeedNegotiate}
> >            && jiffies > server->lstrp + 3 * server->echo_interval)
> >         {
> >                 cifs_server_dbg(VFS, "has not responded in %lu seconds. Reconnecting...\n",
> >                          (3 * server->echo_interval) / HZ);
> >                 cifs_reconnect(server);
> >                 return true;
> >         }
> >         return false;
> > }
> >
> > So it looks to me that cifs_get_tcp_session() starts the
> > cifs_echo_request() work, which calls itself with no delay in an
> > infinite loop doing nothing (that's probably bad...) until session_setup
> > succeeds, after which the delay between the self-call is set.
>
> Perhaps we think that 1 sec is too much for us to complete negotiate?
> But echo_interval of 0 looks bad.
>
> Ideally, we should queue echo work only when session setup is complete.
> What do you think?
>
> >
> > During session_setup():
> >
> > * last response time (lstrp) gets updated
> >
> > * sending/recv requests should interact with the server
> >   credits and call change_conf(), which should enable server->echoes
> >   ==> is that part not working?
> It's working. But since these channels are idle, change_conf is not
> called for these channels.
> So echoes=false till there's some activity on the channel.
>
> >
> > Once enabled, the echo_request workq will finally send the echo on the
> > wire, which should -/+ 1 credit and update lstrp.
> >
> > > Attached a one-line fix for this. Have tested it in my environment.
> > > Another approach to fix this could be to enable echoes during
> > > initialization of a server struct. Or soon after the session setup.
> > > But I felt that this approach is better. Let me know if you feel
> > > otherwise.
> >
> > I think the idea of your change is ok but there's probably also an issue
> > in crediting in session_setup()/change_conf() if echoes is not enabled
> > at this point no?
> >
> > Cheers,
> > --
> > Aurélien Aptel / SUSE Labs Samba Team
> > GPG: 1839 CB5F 9F5B FB9B AA97  8C99 03C8 A49B 521B D5D3
> > SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, DE
> > GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 247165 (AG München)
> >
>
>
> --
> Regards,
> Shyam



-- 
Regards,
Shyam

[-- Attachment #2: 0001-cifs-detect-dead-connections-only-when-echoes-are-en.patch --]
[-- Type: application/octet-stream, Size: 1332 bytes --]

From 8aeead6d520a199a8c44c9f3df30dde72810dd13 Mon Sep 17 00:00:00 2001
From: Shyam Prasad N <sprasad@microsoft.com>
Date: Thu, 29 Apr 2021 07:53:18 +0000
Subject: [PATCH] cifs: detect dead connections only when echoes are enabled.

We can detect server unresponsiveness only if echoes are enabled.
Echoes can be disabled under two scenarios:
1. The connection is low on credits, so we've disabled echoes/oplocks.
2. The connection has not seen any request till now (other than
negotiate/sess-setup), which is when we enable these two, based on
the credits available.

So this fix will check for dead connection, only when echo is enabled.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
---
 fs/cifs/connect.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 121d8b4535b0..becd5f807787 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -476,6 +476,7 @@ server_unresponsive(struct TCP_Server_Info *server)
 	 */
 	if ((server->tcpStatus == CifsGood ||
 	    server->tcpStatus == CifsNeedNegotiate) &&
+	    (!server->ops->can_echo || server->ops->can_echo(server)) &&
 	    time_after(jiffies, server->lstrp + 3 * server->echo_interval)) {
 		cifs_server_dbg(VFS, "has not responded in %lu seconds. Reconnecting...\n",
 			 (3 * server->echo_interval) / HZ);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] cifs: detect dead connections only when echoes are enabled.
  2021-04-29 15:47     ` Shyam Prasad N
@ 2021-04-29 17:37       ` Steve French
  2021-04-29 18:52       ` Steve French
  1 sibling, 0 replies; 7+ messages in thread
From: Steve French @ 2021-04-29 17:37 UTC (permalink / raw)
  To: Shyam Prasad N; +Cc: Aurélien Aptel, CIFS, Pavel Shilovsky

This also reminds me ... perhaps we should increase the echo timeout
when using multichannel, so we don't unnecessarily increase traffic to
the server. Thoughts?

Currently SMB_ECHO_INTERVAL_DEFAULT is 60 seconds

On Thu, Apr 29, 2021 at 10:48 AM Shyam Prasad N <nspmangalore@gmail.com> wrote:
>
> Attaching updated patch with can_echo() instead of referencing echoes directly.
> I'll submit another patch for the other changes.
>
> Regards,
> Shyam
>
> On Thu, Apr 29, 2021 at 5:59 PM Shyam Prasad N <nspmangalore@gmail.com> wrote:
> >
> > Hi Aurélien,
> >
> > Replies below...
> >
> > On Thu, Apr 29, 2021 at 5:16 PM Aurélien Aptel <aaptel@suse.com> wrote:
> > >
> > >
> > > Hi Shyam,
> > >
> > > Ok so I ended up looking at a lot of code to check this... And I'm still
> > > unsure.
> > >
> > > First, it looks like server->echoes is only used for smb2 and there is
> > > a generic server->ops->can_echo() you can use that just returns
> > > server->echoes it for smb2.
> > Agree. I can use can_echo() instead.
> >
> > >
> > > Shyam Prasad N <nspmangalore@gmail.com> writes:
> > > > Hi,
> > > >
> > > > Recently, some xfstests and later some manual testing on multi-channel
> > > > revealed that we detect unresponsive servers on some of the channels
> > > > to the same server.
> > > >
> > > > The issue is seen when a channel is setup and sits idle without any
> > > > traffic. Generally, we enable echoes and oplocks on a channel during
> > > > the first request, based on the number of credits available on the
> > > > channel. So on idle channels, we trip in our logic to check server
> > > > unresponsiveness.
> > >
> > > That makes sense but while looking at the code I see we always queue
> > > echo request in cifs_get_tcp_session(), which is called when adding a
> > > channel.
> > >
> > > cifs_ses_add_channel() {
> > >         ctx.echo_interval = ses->server->echo_interval / HZ;
> > >
> > >         chan->server = cifs_get_tcp_session(&ctx);
> > >
> > >         rc = cifs_negotiate_protocol(xid, ses) {
> > >                 server->tcpStatus = CifsGood;
> > >         }
> > >
> > >         rc = cifs_setup_session(xid, ses, cifs_sb->local_nls);
> > > }
> > >
> > > cifs_get_tcp_session() {
> > >
> > >         INIT_DELAYED_WORK(&tcp_ses->echo, cifs_echo_request);
> > >
> > >         tcp_ses->tcpStatus = CifsNeedNegotiate;
> > >
> > >         tcp_ses->lstrp = jiffies;
> > >
> > >         if (ctx->echo_interval >= SMB_ECHO_INTERVAL_MIN &&
> > >                 ctx->echo_interval <= SMB_ECHO_INTERVAL_MAX)
> > >                 tcp_ses->echo_interval = ctx->echo_interval * HZ;
> > >         else
> > >                 tcp_ses->echo_interval = SMB_ECHO_INTERVAL_DEFAULT * HZ;
> > >
> > >         queue_delayed_work(cifsiod_wq, &tcp_ses->echo, tcp_ses->echo_interval);
> > > }
> > >
> > > cifs_echo_request() {
> > >
> > >         if (server->tcpStatus == CifsNeedNegotiate)
> > >                 echo_interval = 0; <=== branch taken
> > >         else
> > >                 echo_interval = server->echo_interval;
> > >
> > >         if (server->tcpStatus not in {NeedReconnect, Exiting, New}
> > >            && server->can_echo()
> > >            && jiffies > server->lstrp + echo_interval - HZ)
> > >         {
> > >                 server->echo();
> > >         }
> > >
> > >         queue_delayed_work(cifsiod_wq, &server->echo, server->echo_interval);
> > >         ===> echo_interval = 0 so calls itself immediatly
> > > }
> > >
> > > SMB2_echo() {
> > >         if (server->tcpStatus == CifsNeedNegotiate) {
> > >                 /* No need to send echo on newly established connections */
> > >                 mod_delayed_work(cifsiod_wq, &server->reconnect, 0);
> > >                 ====> calls smb2_reconnect_server() immediatly if NeedNego
> > >                 return rc;
> > >         }
> > >
> > > }
> > >
> > > smb2_reconnect_server() {
> > >         // channel has no TCONs so it does essentially nothing
> > > }
> > >
> > > server_unresponsive() {
> > >         if (server->tcpStatus in {Good, NeedNegotiate}
> > >            && jiffies > server->lstrp + 3 * server->echo_interval)
> > >         {
> > >                 cifs_server_dbg(VFS, "has not responded in %lu seconds. Reconnecting...\n",
> > >                          (3 * server->echo_interval) / HZ);
> > >                 cifs_reconnect(server);
> > >                 return true;
> > >         }
> > >         return false;
> > > }
> > >
> > > So it looks to me that cifs_get_tcp_session() starts the
> > > cifs_echo_request() work, which calls itself with no delay in an
> > > infinite loop doing nothing (that's probably bad...) until session_setup
> > > succeeds, after which the delay between the self-call is set.
> >
> > Perhaps we think that 1 sec is too much for us to complete negotiate?
> > But echo_interval of 0 looks bad.
> >
> > Ideally, we should queue echo work only when session setup is complete.
> > What do you think?
> >
> > >
> > > During session_setup():
> > >
> > > * last response time (lstrp) gets updated
> > >
> > > * sending/recv requests should interact with the server
> > >   credits and call change_conf(), which should enable server->echoes
> > >   ==> is that part not working?
> > It's working. But since these channels are idle, change_conf is not
> > called for these channels.
> > So echoes=false till there's some activity on the channel.
> >
> > >
> > > Once enabled, the echo_request workq will finally send the echo on the
> > > wire, which should -/+ 1 credit and update lstrp.
> > >
> > > > Attached a one-line fix for this. Have tested it in my environment.
> > > > Another approach to fix this could be to enable echoes during
> > > > initialization of a server struct. Or soon after the session setup.
> > > > But I felt that this approach is better. Let me know if you feel
> > > > otherwise.
> > >
> > > I think the idea of your change is ok but there's probably also an issue
> > > in crediting in session_setup()/change_conf() if echoes is not enabled
> > > at this point no?
> > >
> > > Cheers,
> > > --
> > > Aurélien Aptel / SUSE Labs Samba Team
> > > GPG: 1839 CB5F 9F5B FB9B AA97  8C99 03C8 A49B 521B D5D3
> > > SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, DE
> > > GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 247165 (AG München)
> > >
> >
> >
> > --
> > Regards,
> > Shyam
>
>
>
> --
> Regards,
> Shyam



-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] cifs: detect dead connections only when echoes are enabled.
  2021-04-29 15:47     ` Shyam Prasad N
  2021-04-29 17:37       ` Steve French
@ 2021-04-29 18:52       ` Steve French
  2021-05-01 16:40         ` Shyam Prasad N
  1 sibling, 1 reply; 7+ messages in thread
From: Steve French @ 2021-04-29 18:52 UTC (permalink / raw)
  To: Shyam Prasad N; +Cc: Aurélien Aptel, CIFS, Pavel Shilovsky

FYI kicked off buildbot with your patch (and Rohiths)

http://smb3-test-rhel-75.southcentralus.cloudapp.azure.com/#/builders/11/builds/41

On Thu, Apr 29, 2021 at 10:48 AM Shyam Prasad N <nspmangalore@gmail.com> wrote:
>
> Attaching updated patch with can_echo() instead of referencing echoes directly.
> I'll submit another patch for the other changes.
>
> Regards,
> Shyam
>
> On Thu, Apr 29, 2021 at 5:59 PM Shyam Prasad N <nspmangalore@gmail.com> wrote:
> >
> > Hi Aurélien,
> >
> > Replies below...
> >
> > On Thu, Apr 29, 2021 at 5:16 PM Aurélien Aptel <aaptel@suse.com> wrote:
> > >
> > >
> > > Hi Shyam,
> > >
> > > Ok so I ended up looking at a lot of code to check this... And I'm still
> > > unsure.
> > >
> > > First, it looks like server->echoes is only used for smb2 and there is
> > > a generic server->ops->can_echo() you can use that just returns
> > > server->echoes it for smb2.
> > Agree. I can use can_echo() instead.
> >
> > >
> > > Shyam Prasad N <nspmangalore@gmail.com> writes:
> > > > Hi,
> > > >
> > > > Recently, some xfstests and later some manual testing on multi-channel
> > > > revealed that we detect unresponsive servers on some of the channels
> > > > to the same server.
> > > >
> > > > The issue is seen when a channel is setup and sits idle without any
> > > > traffic. Generally, we enable echoes and oplocks on a channel during
> > > > the first request, based on the number of credits available on the
> > > > channel. So on idle channels, we trip in our logic to check server
> > > > unresponsiveness.
> > >
> > > That makes sense but while looking at the code I see we always queue
> > > echo request in cifs_get_tcp_session(), which is called when adding a
> > > channel.
> > >
> > > cifs_ses_add_channel() {
> > >         ctx.echo_interval = ses->server->echo_interval / HZ;
> > >
> > >         chan->server = cifs_get_tcp_session(&ctx);
> > >
> > >         rc = cifs_negotiate_protocol(xid, ses) {
> > >                 server->tcpStatus = CifsGood;
> > >         }
> > >
> > >         rc = cifs_setup_session(xid, ses, cifs_sb->local_nls);
> > > }
> > >
> > > cifs_get_tcp_session() {
> > >
> > >         INIT_DELAYED_WORK(&tcp_ses->echo, cifs_echo_request);
> > >
> > >         tcp_ses->tcpStatus = CifsNeedNegotiate;
> > >
> > >         tcp_ses->lstrp = jiffies;
> > >
> > >         if (ctx->echo_interval >= SMB_ECHO_INTERVAL_MIN &&
> > >                 ctx->echo_interval <= SMB_ECHO_INTERVAL_MAX)
> > >                 tcp_ses->echo_interval = ctx->echo_interval * HZ;
> > >         else
> > >                 tcp_ses->echo_interval = SMB_ECHO_INTERVAL_DEFAULT * HZ;
> > >
> > >         queue_delayed_work(cifsiod_wq, &tcp_ses->echo, tcp_ses->echo_interval);
> > > }
> > >
> > > cifs_echo_request() {
> > >
> > >         if (server->tcpStatus == CifsNeedNegotiate)
> > >                 echo_interval = 0; <=== branch taken
> > >         else
> > >                 echo_interval = server->echo_interval;
> > >
> > >         if (server->tcpStatus not in {NeedReconnect, Exiting, New}
> > >            && server->can_echo()
> > >            && jiffies > server->lstrp + echo_interval - HZ)
> > >         {
> > >                 server->echo();
> > >         }
> > >
> > >         queue_delayed_work(cifsiod_wq, &server->echo, server->echo_interval);
> > >         ===> echo_interval = 0 so calls itself immediatly
> > > }
> > >
> > > SMB2_echo() {
> > >         if (server->tcpStatus == CifsNeedNegotiate) {
> > >                 /* No need to send echo on newly established connections */
> > >                 mod_delayed_work(cifsiod_wq, &server->reconnect, 0);
> > >                 ====> calls smb2_reconnect_server() immediatly if NeedNego
> > >                 return rc;
> > >         }
> > >
> > > }
> > >
> > > smb2_reconnect_server() {
> > >         // channel has no TCONs so it does essentially nothing
> > > }
> > >
> > > server_unresponsive() {
> > >         if (server->tcpStatus in {Good, NeedNegotiate}
> > >            && jiffies > server->lstrp + 3 * server->echo_interval)
> > >         {
> > >                 cifs_server_dbg(VFS, "has not responded in %lu seconds. Reconnecting...\n",
> > >                          (3 * server->echo_interval) / HZ);
> > >                 cifs_reconnect(server);
> > >                 return true;
> > >         }
> > >         return false;
> > > }
> > >
> > > So it looks to me that cifs_get_tcp_session() starts the
> > > cifs_echo_request() work, which calls itself with no delay in an
> > > infinite loop doing nothing (that's probably bad...) until session_setup
> > > succeeds, after which the delay between the self-call is set.
> >
> > Perhaps we think that 1 sec is too much for us to complete negotiate?
> > But echo_interval of 0 looks bad.
> >
> > Ideally, we should queue echo work only when session setup is complete.
> > What do you think?
> >
> > >
> > > During session_setup():
> > >
> > > * last response time (lstrp) gets updated
> > >
> > > * sending/recv requests should interact with the server
> > >   credits and call change_conf(), which should enable server->echoes
> > >   ==> is that part not working?
> > It's working. But since these channels are idle, change_conf is not
> > called for these channels.
> > So echoes=false till there's some activity on the channel.
> >
> > >
> > > Once enabled, the echo_request workq will finally send the echo on the
> > > wire, which should -/+ 1 credit and update lstrp.
> > >
> > > > Attached a one-line fix for this. Have tested it in my environment.
> > > > Another approach to fix this could be to enable echoes during
> > > > initialization of a server struct. Or soon after the session setup.
> > > > But I felt that this approach is better. Let me know if you feel
> > > > otherwise.
> > >
> > > I think the idea of your change is ok but there's probably also an issue
> > > in crediting in session_setup()/change_conf() if echoes is not enabled
> > > at this point no?
> > >
> > > Cheers,
> > > --
> > > Aurélien Aptel / SUSE Labs Samba Team
> > > GPG: 1839 CB5F 9F5B FB9B AA97  8C99 03C8 A49B 521B D5D3
> > > SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, DE
> > > GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 247165 (AG München)
> > >
> >
> >
> > --
> > Regards,
> > Shyam
>
>
>
> --
> Regards,
> Shyam



-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] cifs: detect dead connections only when echoes are enabled.
  2021-04-29 18:52       ` Steve French
@ 2021-05-01 16:40         ` Shyam Prasad N
  0 siblings, 0 replies; 7+ messages in thread
From: Shyam Prasad N @ 2021-05-01 16:40 UTC (permalink / raw)
  To: Steve French; +Cc: Aurélien Aptel, CIFS, Pavel Shilovsky

[-- Attachment #1: Type: text/plain, Size: 7540 bytes --]

Hi Aurelien,

I did some reading in this code path, and it seems like echo_interval
was changed to 0 for NeedNegotiate cases to handle cases of too much
delay between echoes causing the server to be declared as
unresponsive.
This was back when we used 2*echo_interval as the timeout to declare
the server as unresponsive. Later, Ronnie changed this to
3*echo_interval.
So we should not need to change echo_interval to 0 anymore.

Attached patch has the fix.
Please let me know what you think about this fix.

Regards,
Shyam

On Fri, Apr 30, 2021 at 12:23 AM Steve French <smfrench@gmail.com> wrote:
>
> FYI kicked off buildbot with your patch (and Rohiths)
>
> http://smb3-test-rhel-75.southcentralus.cloudapp.azure.com/#/builders/11/builds/41
>
> On Thu, Apr 29, 2021 at 10:48 AM Shyam Prasad N <nspmangalore@gmail.com> wrote:
> >
> > Attaching updated patch with can_echo() instead of referencing echoes directly.
> > I'll submit another patch for the other changes.
> >
> > Regards,
> > Shyam
> >
> > On Thu, Apr 29, 2021 at 5:59 PM Shyam Prasad N <nspmangalore@gmail.com> wrote:
> > >
> > > Hi Aurélien,
> > >
> > > Replies below...
> > >
> > > On Thu, Apr 29, 2021 at 5:16 PM Aurélien Aptel <aaptel@suse.com> wrote:
> > > >
> > > >
> > > > Hi Shyam,
> > > >
> > > > Ok so I ended up looking at a lot of code to check this... And I'm still
> > > > unsure.
> > > >
> > > > First, it looks like server->echoes is only used for smb2 and there is
> > > > a generic server->ops->can_echo() you can use that just returns
> > > > server->echoes it for smb2.
> > > Agree. I can use can_echo() instead.
> > >
> > > >
> > > > Shyam Prasad N <nspmangalore@gmail.com> writes:
> > > > > Hi,
> > > > >
> > > > > Recently, some xfstests and later some manual testing on multi-channel
> > > > > revealed that we detect unresponsive servers on some of the channels
> > > > > to the same server.
> > > > >
> > > > > The issue is seen when a channel is setup and sits idle without any
> > > > > traffic. Generally, we enable echoes and oplocks on a channel during
> > > > > the first request, based on the number of credits available on the
> > > > > channel. So on idle channels, we trip in our logic to check server
> > > > > unresponsiveness.
> > > >
> > > > That makes sense but while looking at the code I see we always queue
> > > > echo request in cifs_get_tcp_session(), which is called when adding a
> > > > channel.
> > > >
> > > > cifs_ses_add_channel() {
> > > >         ctx.echo_interval = ses->server->echo_interval / HZ;
> > > >
> > > >         chan->server = cifs_get_tcp_session(&ctx);
> > > >
> > > >         rc = cifs_negotiate_protocol(xid, ses) {
> > > >                 server->tcpStatus = CifsGood;
> > > >         }
> > > >
> > > >         rc = cifs_setup_session(xid, ses, cifs_sb->local_nls);
> > > > }
> > > >
> > > > cifs_get_tcp_session() {
> > > >
> > > >         INIT_DELAYED_WORK(&tcp_ses->echo, cifs_echo_request);
> > > >
> > > >         tcp_ses->tcpStatus = CifsNeedNegotiate;
> > > >
> > > >         tcp_ses->lstrp = jiffies;
> > > >
> > > >         if (ctx->echo_interval >= SMB_ECHO_INTERVAL_MIN &&
> > > >                 ctx->echo_interval <= SMB_ECHO_INTERVAL_MAX)
> > > >                 tcp_ses->echo_interval = ctx->echo_interval * HZ;
> > > >         else
> > > >                 tcp_ses->echo_interval = SMB_ECHO_INTERVAL_DEFAULT * HZ;
> > > >
> > > >         queue_delayed_work(cifsiod_wq, &tcp_ses->echo, tcp_ses->echo_interval);
> > > > }
> > > >
> > > > cifs_echo_request() {
> > > >
> > > >         if (server->tcpStatus == CifsNeedNegotiate)
> > > >                 echo_interval = 0; <=== branch taken
> > > >         else
> > > >                 echo_interval = server->echo_interval;
> > > >
> > > >         if (server->tcpStatus not in {NeedReconnect, Exiting, New}
> > > >            && server->can_echo()
> > > >            && jiffies > server->lstrp + echo_interval - HZ)
> > > >         {
> > > >                 server->echo();
> > > >         }
> > > >
> > > >         queue_delayed_work(cifsiod_wq, &server->echo, server->echo_interval);
> > > >         ===> echo_interval = 0 so calls itself immediatly
> > > > }
> > > >
> > > > SMB2_echo() {
> > > >         if (server->tcpStatus == CifsNeedNegotiate) {
> > > >                 /* No need to send echo on newly established connections */
> > > >                 mod_delayed_work(cifsiod_wq, &server->reconnect, 0);
> > > >                 ====> calls smb2_reconnect_server() immediatly if NeedNego
> > > >                 return rc;
> > > >         }
> > > >
> > > > }
> > > >
> > > > smb2_reconnect_server() {
> > > >         // channel has no TCONs so it does essentially nothing
> > > > }
> > > >
> > > > server_unresponsive() {
> > > >         if (server->tcpStatus in {Good, NeedNegotiate}
> > > >            && jiffies > server->lstrp + 3 * server->echo_interval)
> > > >         {
> > > >                 cifs_server_dbg(VFS, "has not responded in %lu seconds. Reconnecting...\n",
> > > >                          (3 * server->echo_interval) / HZ);
> > > >                 cifs_reconnect(server);
> > > >                 return true;
> > > >         }
> > > >         return false;
> > > > }
> > > >
> > > > So it looks to me that cifs_get_tcp_session() starts the
> > > > cifs_echo_request() work, which calls itself with no delay in an
> > > > infinite loop doing nothing (that's probably bad...) until session_setup
> > > > succeeds, after which the delay between the self-call is set.
> > >
> > > Perhaps we think that 1 sec is too much for us to complete negotiate?
> > > But echo_interval of 0 looks bad.
> > >
> > > Ideally, we should queue echo work only when session setup is complete.
> > > What do you think?
> > >
> > > >
> > > > During session_setup():
> > > >
> > > > * last response time (lstrp) gets updated
> > > >
> > > > * sending/recv requests should interact with the server
> > > >   credits and call change_conf(), which should enable server->echoes
> > > >   ==> is that part not working?
> > > It's working. But since these channels are idle, change_conf is not
> > > called for these channels.
> > > So echoes=false till there's some activity on the channel.
> > >
> > > >
> > > > Once enabled, the echo_request workq will finally send the echo on the
> > > > wire, which should -/+ 1 credit and update lstrp.
> > > >
> > > > > Attached a one-line fix for this. Have tested it in my environment.
> > > > > Another approach to fix this could be to enable echoes during
> > > > > initialization of a server struct. Or soon after the session setup.
> > > > > But I felt that this approach is better. Let me know if you feel
> > > > > otherwise.
> > > >
> > > > I think the idea of your change is ok but there's probably also an issue
> > > > in crediting in session_setup()/change_conf() if echoes is not enabled
> > > > at this point no?
> > > >
> > > > Cheers,
> > > > --
> > > > Aurélien Aptel / SUSE Labs Samba Team
> > > > GPG: 1839 CB5F 9F5B FB9B AA97  8C99 03C8 A49B 521B D5D3
> > > > SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, DE
> > > > GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 247165 (AG München)
> > > >
> > >
> > >
> > > --
> > > Regards,
> > > Shyam
> >
> >
> >
> > --
> > Regards,
> > Shyam
>
>
>
> --
> Thanks,
>
> Steve



-- 
Regards,
Shyam

[-- Attachment #2: 0001-cifs-use-echo_interval-even-when-connection-not-read.patch --]
[-- Type: application/octet-stream, Size: 1683 bytes --]

From 334ab0efa95f393dff926fa94745d95372c30bf1 Mon Sep 17 00:00:00 2001
From: Shyam Prasad N <sprasad@microsoft.com>
Date: Sat, 1 May 2021 16:17:07 +0000
Subject: [PATCH] cifs: use echo_interval even when connection not ready.

When the tcp connection is not ready to send requests,
we keep retrying echo with an interval of zero.

This seems unnecessary, and this fix changes the interval
between echoes to what is specified as echo_interval.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
---
 fs/cifs/connect.c | 12 +-----------
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index becd5f807787..d7efbcb2d5df 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -392,16 +392,6 @@ cifs_echo_request(struct work_struct *work)
 	int rc;
 	struct TCP_Server_Info *server = container_of(work,
 					struct TCP_Server_Info, echo.work);
-	unsigned long echo_interval;
-
-	/*
-	 * If we need to renegotiate, set echo interval to zero to
-	 * immediately call echo service where we can renegotiate.
-	 */
-	if (server->tcpStatus == CifsNeedNegotiate)
-		echo_interval = 0;
-	else
-		echo_interval = server->echo_interval;
 
 	/*
 	 * We cannot send an echo if it is disabled.
@@ -412,7 +402,7 @@ cifs_echo_request(struct work_struct *work)
 	    server->tcpStatus == CifsExiting ||
 	    server->tcpStatus == CifsNew ||
 	    (server->ops->can_echo && !server->ops->can_echo(server)) ||
-	    time_before(jiffies, server->lstrp + echo_interval - HZ))
+	    time_before(jiffies, server->lstrp + server->echo_interval - HZ))
 		goto requeue_echo;
 
 	rc = server->ops->echo ? server->ops->echo(server) : -ENOSYS;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-05-01 16:40 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-29  8:12 [PATCH] cifs: detect dead connections only when echoes are enabled Shyam Prasad N
2021-04-29 11:46 ` Aurélien Aptel
2021-04-29 12:29   ` Shyam Prasad N
2021-04-29 15:47     ` Shyam Prasad N
2021-04-29 17:37       ` Steve French
2021-04-29 18:52       ` Steve French
2021-05-01 16:40         ` Shyam Prasad N

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).