linux-cifs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Multichannel patches
@ 2021-06-02 12:43 Shyam Prasad N
  2021-06-02 15:57 ` Aurélien Aptel
  2021-06-02 16:45 ` Paulo Alcantara
  0 siblings, 2 replies; 11+ messages in thread
From: Shyam Prasad N @ 2021-06-02 12:43 UTC (permalink / raw)
  To: Steve French, Aurélien Aptel, CIFS, sribhat.msa, rohiths msft

Made a bunch of fixes in mchan reconnect scenarios with these patches.
The reconnect codepath did not play well with mchan. Making it work
involved several changes described in the commits.

https://github.com/sprasad-microsoft/smb-kernel-client/commit/03059a751fa7315ecd44ad61342200d20ab3f0cb.patch
A trivial ref-counting fix in session find codepath.

https://github.com/sprasad-microsoft/smb-kernel-client/commit/2615c7300f93a7ae2d9350da25ed117492f8edbf.patch
Fixes in reconnect codepath, involving changes to introduce a
per-channel bitmask for a session, instead of a boolean, and to
reconnect the session only when we have all channels marked for
reconnect. This makes sessions more resilient in mchan scenarios.

https://github.com/sprasad-microsoft/smb-kernel-client/commit/5bcce1741beea482e770f1c25f4ff285a1505ca4.patch
Fix to get rid of the serialization of requests during channel
binding. This involved passing the channel server pointer to all
negotiate and session setup related functions. This has increased the
diff size, otherwise the changes are quite minimal.

https://github.com/sprasad-microsoft/smb-kernel-client/pull/4

P.S. There is a logic in cifs_reconnect to switch between the targets
for the server. I don't think these changes will break the DFS
scenario. The code will likely take effect only for when the primary
channel reconnects (as DFS server entries are cached with super block
as the key). Perhaps more changes will be needed there to also switch
between the targets for individual channels (maybe use superblock +
channel num as the key for caching entries?). Folks with better
knowledge than me with this code may want to check on this?

@Steve French It'll be good to let a few cycles of buildbot to run
with this code, before submitting to upstream. I have run a bunch of
tests with this. However, more soak time will be safer.

-- 
Regards,
Shyam

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Multichannel patches
  2021-06-02 12:43 Multichannel patches Shyam Prasad N
@ 2021-06-02 15:57 ` Aurélien Aptel
  2021-06-02 18:55   ` Steve French
  2021-06-02 16:45 ` Paulo Alcantara
  1 sibling, 1 reply; 11+ messages in thread
From: Aurélien Aptel @ 2021-06-02 15:57 UTC (permalink / raw)
  To: Shyam Prasad N, Steve French, CIFS, sribhat.msa, rohiths msft

Hi Shyam,

We've already discussed how to go about it so your changes LGTM.
I've added some minor comments on your github pull request.

Shyam Prasad N <nspmangalore@gmail.com> writes:
> P.S. There is a logic in cifs_reconnect to switch between the targets
> for the server. I don't think these changes will break the DFS
> scenario. The code will likely take effect only for when the primary
> channel reconnects (as DFS server entries are cached with super block
> as the key). Perhaps more changes will be needed there to also switch
> between the targets for individual channels (maybe use superblock +
> channel num as the key for caching entries?). Folks with better
> knowledge than me with this code may want to check on this?

You need to talk with Paulo for this. He has just sent a rewrite of the
cache yday.

> @Steve French It'll be good to let a few cycles of buildbot to run
> with this code, before submitting to upstream. I have run a bunch of
> tests with this. However, more soak time will be safer.

We need more tests on the buildbot so if you have things for mchan we
can run on it please add it there.

Cheers,
-- 
Aurélien Aptel / SUSE Labs Samba Team
GPG: 1839 CB5F 9F5B FB9B AA97  8C99 03C8 A49B 521B D5D3
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, DE
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 247165 (AG München)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Multichannel patches
  2021-06-02 12:43 Multichannel patches Shyam Prasad N
  2021-06-02 15:57 ` Aurélien Aptel
@ 2021-06-02 16:45 ` Paulo Alcantara
  2021-06-04  9:13   ` Shyam Prasad N
  1 sibling, 1 reply; 11+ messages in thread
From: Paulo Alcantara @ 2021-06-02 16:45 UTC (permalink / raw)
  To: Shyam Prasad N, Steve French, Aurélien Aptel, CIFS,
	sribhat.msa, rohiths msft

Shyam Prasad N <nspmangalore@gmail.com> writes:

> P.S. There is a logic in cifs_reconnect to switch between the targets
> for the server. I don't think these changes will break the DFS
> scenario. The code will likely take effect only for when the primary
> channel reconnects (as DFS server entries are cached with super block
> as the key). Perhaps more changes will be needed there to also switch
> between the targets for individual channels (maybe use superblock +
> channel num as the key for caching entries?). Folks with better
> knowledge than me with this code may want to check on this?

I don't think your changes would break it as well.  The super is only
used for providing cifs_sb_info::origin_fullpath as key to find the
corresponding failover targets in referral cache.

I can help you testing your changes with some DFS+multichannel failover
scenarios.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Multichannel patches
  2021-06-02 15:57 ` Aurélien Aptel
@ 2021-06-02 18:55   ` Steve French
  0 siblings, 0 replies; 11+ messages in thread
From: Steve French @ 2021-06-02 18:55 UTC (permalink / raw)
  To: Aurélien Aptel; +Cc: Shyam Prasad N, CIFS, sribhat.msa, rohiths msft

Looking through the PR - agree with Aurelien's comments (mostly minor)

On Wed, Jun 2, 2021 at 10:57 AM Aurélien Aptel <aaptel@suse.com> wrote:
>
> Hi Shyam,
>
> We've already discussed how to go about it so your changes LGTM.
> I've added some minor comments on your github pull request.
>
> Shyam Prasad N <nspmangalore@gmail.com> writes:
> > P.S. There is a logic in cifs_reconnect to switch between the targets
> > for the server. I don't think these changes will break the DFS
> > scenario. The code will likely take effect only for when the primary
> > channel reconnects (as DFS server entries are cached with super block
> > as the key). Perhaps more changes will be needed there to also switch
> > between the targets for individual channels (maybe use superblock +
> > channel num as the key for caching entries?). Folks with better
> > knowledge than me with this code may want to check on this?
>
> You need to talk with Paulo for this. He has just sent a rewrite of the
> cache yday.
>
> > @Steve French It'll be good to let a few cycles of buildbot to run
> > with this code, before submitting to upstream. I have run a bunch of
> > tests with this. However, more soak time will be safer.
>
> We need more tests on the buildbot so if you have things for mchan we
> can run on it please add it there.
>
> Cheers,
> --
> Aurélien Aptel / SUSE Labs Samba Team
> GPG: 1839 CB5F 9F5B FB9B AA97  8C99 03C8 A49B 521B D5D3
> SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, DE
> GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 247165 (AG München)
>


-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Multichannel patches
  2021-06-02 16:45 ` Paulo Alcantara
@ 2021-06-04  9:13   ` Shyam Prasad N
  2021-06-04 15:10     ` Steve French
  2021-06-04 15:25     ` Paulo Alcantara
  0 siblings, 2 replies; 11+ messages in thread
From: Shyam Prasad N @ 2021-06-04  9:13 UTC (permalink / raw)
  To: Paulo Alcantara
  Cc: Steve French, Aurélien Aptel, CIFS, sribhat.msa, rohiths msft

Thanks for the reviews.

While rebasing older changes, I missed a single line during the
channel add code, which had caused a regression.
Fixed it and merged with the correct patch here:
https://github.com/sprasad-microsoft/smb-kernel-client/pull/5

I've also integrated Aurelien's comments into these patches.
Also, I've a couple of minor patches here to fix fscache warnings for
multichannel and to integrate new changes into DebugData.
@Steve French Please use these new patches.

@Paulo Alcantara That would be great if you can help testing my
changes. Please test with these new changes.
> The super is only used for providing cifs_sb_info::origin_fullpath as key to find the corresponding failover targets in referral cache.
I'm wondering what would happen if there are multiple tcons to the
same origin_fullpath (possibly in different sessions)?
Also, doesn't failover targets apply to each channel under a session?
Shouldn't we switch targets on reconnect of secondary channels too?

On Wed, Jun 2, 2021 at 10:15 PM Paulo Alcantara <pc@cjr.nz> wrote:
>
> Shyam Prasad N <nspmangalore@gmail.com> writes:
>
> > P.S. There is a logic in cifs_reconnect to switch between the targets
> > for the server. I don't think these changes will break the DFS
> > scenario. The code will likely take effect only for when the primary
> > channel reconnects (as DFS server entries are cached with super block
> > as the key). Perhaps more changes will be needed there to also switch
> > between the targets for individual channels (maybe use superblock +
> > channel num as the key for caching entries?). Folks with better
> > knowledge than me with this code may want to check on this?
>
> I don't think your changes would break it as well.  The super is only
> used for providing cifs_sb_info::origin_fullpath as key to find the
> corresponding failover targets in referral cache.
>
> I can help you testing your changes with some DFS+multichannel failover
> scenarios.



-- 
Regards,
Shyam

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Multichannel patches
  2021-06-04  9:13   ` Shyam Prasad N
@ 2021-06-04 15:10     ` Steve French
  2021-06-04 15:25     ` Paulo Alcantara
  1 sibling, 0 replies; 11+ messages in thread
From: Steve French @ 2021-06-04 15:10 UTC (permalink / raw)
  To: Shyam Prasad N
  Cc: Paulo Alcantara, Aurélien Aptel, CIFS, sribhat.msa, rohiths msft

tentatively merged pending review and testing (after minor updates to
formatting in one patch)

buildbot run in progress on these 5 (and also includes Aurelien's small patch)

http://smb3-test-rhel-75.southcentralus.cloudapp.azure.com/#/builders/2/builds/665

On Fri, Jun 4, 2021 at 4:13 AM Shyam Prasad N <nspmangalore@gmail.com> wrote:
>
> Thanks for the reviews.
>
> While rebasing older changes, I missed a single line during the
> channel add code, which had caused a regression.
> Fixed it and merged with the correct patch here:
> https://github.com/sprasad-microsoft/smb-kernel-client/pull/5
>
> I've also integrated Aurelien's comments into these patches.
> Also, I've a couple of minor patches here to fix fscache warnings for
> multichannel and to integrate new changes into DebugData.
> @Steve French Please use these new patches.
>
> @Paulo Alcantara That would be great if you can help testing my
> changes. Please test with these new changes.
> > The super is only used for providing cifs_sb_info::origin_fullpath as key to find the corresponding failover targets in referral cache.
> I'm wondering what would happen if there are multiple tcons to the
> same origin_fullpath (possibly in different sessions)?
> Also, doesn't failover targets apply to each channel under a session?
> Shouldn't we switch targets on reconnect of secondary channels too?
>
> On Wed, Jun 2, 2021 at 10:15 PM Paulo Alcantara <pc@cjr.nz> wrote:
> >
> > Shyam Prasad N <nspmangalore@gmail.com> writes:
> >
> > > P.S. There is a logic in cifs_reconnect to switch between the targets
> > > for the server. I don't think these changes will break the DFS
> > > scenario. The code will likely take effect only for when the primary
> > > channel reconnects (as DFS server entries are cached with super block
> > > as the key). Perhaps more changes will be needed there to also switch
> > > between the targets for individual channels (maybe use superblock +
> > > channel num as the key for caching entries?). Folks with better
> > > knowledge than me with this code may want to check on this?
> >
> > I don't think your changes would break it as well.  The super is only
> > used for providing cifs_sb_info::origin_fullpath as key to find the
> > corresponding failover targets in referral cache.
> >
> > I can help you testing your changes with some DFS+multichannel failover
> > scenarios.
>
>
>
> --
> Regards,
> Shyam



-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Multichannel patches
  2021-06-04  9:13   ` Shyam Prasad N
  2021-06-04 15:10     ` Steve French
@ 2021-06-04 15:25     ` Paulo Alcantara
  2021-06-05 14:08       ` Shyam Prasad N
  1 sibling, 1 reply; 11+ messages in thread
From: Paulo Alcantara @ 2021-06-04 15:25 UTC (permalink / raw)
  To: Shyam Prasad N
  Cc: Steve French, Aurélien Aptel, CIFS, sribhat.msa, rohiths msft

Shyam Prasad N <nspmangalore@gmail.com> writes:

> @Paulo Alcantara That would be great if you can help testing my
> changes. Please test with these new changes.

OK.

>> The super is only used for providing cifs_sb_info::origin_fullpath as key to find the corresponding failover targets in referral cache.
> I'm wondering what would happen if there are multiple tcons to the
> same origin_fullpath (possibly in different sessions)?

That is certainly a problem, indeed.  I'm waiting for the DFS tests to
finish and then send a series that contains a potential fix for that --
e.g. not sharing TCP servers when mounting DFS shares.  We used to not
share tcons with DFS mounts because they might contain different prefix
paths but connected to same share, however that wasn't enough because
multiple DFS mounts may connect to same target servers, although they
might failover to completely different servers.

> Also, doesn't failover targets apply to each channel under a session?
> Shouldn't we switch targets on reconnect of secondary channels too?

That's a interesting question.  I recall discussing this with Aurelien
some time ago while running a few DFS + multichannel tests.

So yes, I agree with you that when we successfully reconnect to failover
target (primary channel), then we should also update all secondary
channels with the new server's ip address and reconnect them.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Multichannel patches
  2021-06-04 15:25     ` Paulo Alcantara
@ 2021-06-05 14:08       ` Shyam Prasad N
  2021-06-05 17:43         ` Steve French
  0 siblings, 1 reply; 11+ messages in thread
From: Shyam Prasad N @ 2021-06-05 14:08 UTC (permalink / raw)
  To: Paulo Alcantara
  Cc: Steve French, Aurélien Aptel, CIFS, sribhat.msa, rohiths msft

The buildbot testing once hit a deadlock when running with the above patches.
I found one possibility during cifs_reconnect, where a deadlock can occur.

I've fixed that and some warnings that kernel bots have identified in
the below two patches:
https://github.com/sprasad-microsoft/smb-kernel-client/pull/5/commits/f3e65f72b03b03bc4b301e8e04e9babb0e9582cf.patch
https://github.com/sprasad-microsoft/smb-kernel-client/pull/5/commits/7b3e867e994a7cbc88efe85c95167ae49d4b7a9d.patch

Regards,
Shyam

On Fri, Jun 4, 2021 at 8:55 PM Paulo Alcantara <pc@cjr.nz> wrote:
>
> Shyam Prasad N <nspmangalore@gmail.com> writes:
>
> > @Paulo Alcantara That would be great if you can help testing my
> > changes. Please test with these new changes.
>
> OK.
>
> >> The super is only used for providing cifs_sb_info::origin_fullpath as key to find the corresponding failover targets in referral cache.
> > I'm wondering what would happen if there are multiple tcons to the
> > same origin_fullpath (possibly in different sessions)?
>
> That is certainly a problem, indeed.  I'm waiting for the DFS tests to
> finish and then send a series that contains a potential fix for that --
> e.g. not sharing TCP servers when mounting DFS shares.  We used to not
> share tcons with DFS mounts because they might contain different prefix
> paths but connected to same share, however that wasn't enough because
> multiple DFS mounts may connect to same target servers, although they
> might failover to completely different servers.
>
> > Also, doesn't failover targets apply to each channel under a session?
> > Shouldn't we switch targets on reconnect of secondary channels too?
>
> That's a interesting question.  I recall discussing this with Aurelien
> some time ago while running a few DFS + multichannel tests.
>
> So yes, I agree with you that when we successfully reconnect to failover
> target (primary channel), then we should also update all secondary
> channels with the new server's ip address and reconnect them.



-- 
Regards,
Shyam

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Multichannel patches
  2021-06-05 14:08       ` Shyam Prasad N
@ 2021-06-05 17:43         ` Steve French
  2021-06-05 17:57           ` Steve French
  0 siblings, 1 reply; 11+ messages in thread
From: Steve French @ 2021-06-05 17:43 UTC (permalink / raw)
  To: Shyam Prasad N
  Cc: Paulo Alcantara, Aurélien Aptel, CIFS, sribhat.msa, rohiths msft

Had to fix two things in those recently updated patches to get them to
build.  You may want to fold those in as well to make the patches
clearer to read and bisectable

smfrench@smfrench-ThinkPad-P52:~/cifs-2.6/fs/cifs$ git diff -a
diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 6e4934052159..f340b7d389ef 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -260,7 +260,7 @@ cifs_reconnect(struct TCP_Server_Info *server)

                /* If all channels need reconnect, then tcon needs reconnect */
                if (!CIFS_ALL_CHANS_NEED_RECONNECT(ses))
-                       goto skip_tcon_reconnect;
+                       continue;

                list_for_each(tmp2, &ses->tcon_list) {
                        tcon = list_entry(tmp2, struct cifs_tcon, tcon_list);
@@ -268,8 +268,6 @@ cifs_reconnect(struct TCP_Server_Info *server)
                }
                if (ses->tcon_ipc)
                        ses->tcon_ipc->need_reconnect = true;
-
-skip_tcon_reconnect:
        }
        spin_unlock(&cifs_tcp_ses_lock);
        mutex_unlock(&ses->session_mutex);
diff --git a/fs/cifs/sess.c b/fs/cifs/sess.c
index cdbd256be4e4..202d98d06606 100644
--- a/fs/cifs/sess.c
+++ b/fs/cifs/sess.c
@@ -92,10 +92,10 @@ unsigned long cifs_ses_get_chan_index(struct cifs_ses *ses,
 void cifs_chan_set_need_reconnect(struct cifs_ses *ses,
                                   struct TCP_Server_Info *server)
 {
-       size_t chan_index = cifs_ses_get_chan_index(ses, server);
+       unsigned long chan_index = cifs_ses_get_chan_index(ses, server);

        set_bit(chan_index, &ses->chans_need_reconnect);
-       cifs_dbg(FYI, "Set reconnect bitmask for chan %u; now 0x%lx\n",
+       cifs_dbg(FYI, "Set reconnect bitmask for chan %lu; now 0x%lx\n",
                 chan_index, ses->chans_need_reconnect);

On Sat, Jun 5, 2021 at 9:08 AM Shyam Prasad N <nspmangalore@gmail.com> wrote:
>
> The buildbot testing once hit a deadlock when running with the above patches.
> I found one possibility during cifs_reconnect, where a deadlock can occur.
>
> I've fixed that and some warnings that kernel bots have identified in
> the below two patches:
> https://github.com/sprasad-microsoft/smb-kernel-client/pull/5/commits/f3e65f72b03b03bc4b301e8e04e9babb0e9582cf.patch
> https://github.com/sprasad-microsoft/smb-kernel-client/pull/5/commits/7b3e867e994a7cbc88efe85c95167ae49d4b7a9d.patch
>
> Regards,
> Shyam
>
> On Fri, Jun 4, 2021 at 8:55 PM Paulo Alcantara <pc@cjr.nz> wrote:
> >
> > Shyam Prasad N <nspmangalore@gmail.com> writes:
> >
> > > @Paulo Alcantara That would be great if you can help testing my
> > > changes. Please test with these new changes.
> >
> > OK.
> >
> > >> The super is only used for providing cifs_sb_info::origin_fullpath as key to find the corresponding failover targets in referral cache.
> > > I'm wondering what would happen if there are multiple tcons to the
> > > same origin_fullpath (possibly in different sessions)?
> >
> > That is certainly a problem, indeed.  I'm waiting for the DFS tests to
> > finish and then send a series that contains a potential fix for that --
> > e.g. not sharing TCP servers when mounting DFS shares.  We used to not
> > share tcons with DFS mounts because they might contain different prefix
> > paths but connected to same share, however that wasn't enough because
> > multiple DFS mounts may connect to same target servers, although they
> > might failover to completely different servers.
> >
> > > Also, doesn't failover targets apply to each channel under a session?
> > > Shouldn't we switch targets on reconnect of secondary channels too?
> >
> > That's a interesting question.  I recall discussing this with Aurelien
> > some time ago while running a few DFS + multichannel tests.
> >
> > So yes, I agree with you that when we successfully reconnect to failover
> > target (primary channel), then we should also update all secondary
> > channels with the new server's ip address and reconnect them.
>
>
>
> --
> Regards,
> Shyam



-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Multichannel patches
  2021-06-05 17:43         ` Steve French
@ 2021-06-05 17:57           ` Steve French
  2021-06-08  9:54             ` Shyam Prasad N
  0 siblings, 1 reply; 11+ messages in thread
From: Steve French @ 2021-06-05 17:57 UTC (permalink / raw)
  To: Shyam Prasad N
  Cc: Paulo Alcantara, Aurélien Aptel, CIFS, sribhat.msa, rohiths msft

test run started with those 9 patches (8 multichannel, and Aurelien's
small patch)

http://smb3-test-rhel-75.southcentralus.cloudapp.azure.com/#/builders/11/builds/68

On Sat, Jun 5, 2021 at 12:43 PM Steve French <smfrench@gmail.com> wrote:
>
> Had to fix two things in those recently updated patches to get them to
> build.  You may want to fold those in as well to make the patches
> clearer to read and bisectable
>
> smfrench@smfrench-ThinkPad-P52:~/cifs-2.6/fs/cifs$ git diff -a
> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> index 6e4934052159..f340b7d389ef 100644
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -260,7 +260,7 @@ cifs_reconnect(struct TCP_Server_Info *server)
>
>                 /* If all channels need reconnect, then tcon needs reconnect */
>                 if (!CIFS_ALL_CHANS_NEED_RECONNECT(ses))
> -                       goto skip_tcon_reconnect;
> +                       continue;
>
>                 list_for_each(tmp2, &ses->tcon_list) {
>                         tcon = list_entry(tmp2, struct cifs_tcon, tcon_list);
> @@ -268,8 +268,6 @@ cifs_reconnect(struct TCP_Server_Info *server)
>                 }
>                 if (ses->tcon_ipc)
>                         ses->tcon_ipc->need_reconnect = true;
> -
> -skip_tcon_reconnect:
>         }
>         spin_unlock(&cifs_tcp_ses_lock);
>         mutex_unlock(&ses->session_mutex);
> diff --git a/fs/cifs/sess.c b/fs/cifs/sess.c
> index cdbd256be4e4..202d98d06606 100644
> --- a/fs/cifs/sess.c
> +++ b/fs/cifs/sess.c
> @@ -92,10 +92,10 @@ unsigned long cifs_ses_get_chan_index(struct cifs_ses *ses,
>  void cifs_chan_set_need_reconnect(struct cifs_ses *ses,
>                                    struct TCP_Server_Info *server)
>  {
> -       size_t chan_index = cifs_ses_get_chan_index(ses, server);
> +       unsigned long chan_index = cifs_ses_get_chan_index(ses, server);
>
>         set_bit(chan_index, &ses->chans_need_reconnect);
> -       cifs_dbg(FYI, "Set reconnect bitmask for chan %u; now 0x%lx\n",
> +       cifs_dbg(FYI, "Set reconnect bitmask for chan %lu; now 0x%lx\n",
>                  chan_index, ses->chans_need_reconnect);
>
> On Sat, Jun 5, 2021 at 9:08 AM Shyam Prasad N <nspmangalore@gmail.com> wrote:
> >
> > The buildbot testing once hit a deadlock when running with the above patches.
> > I found one possibility during cifs_reconnect, where a deadlock can occur.
> >
> > I've fixed that and some warnings that kernel bots have identified in
> > the below two patches:
> > https://github.com/sprasad-microsoft/smb-kernel-client/pull/5/commits/f3e65f72b03b03bc4b301e8e04e9babb0e9582cf.patch
> > https://github.com/sprasad-microsoft/smb-kernel-client/pull/5/commits/7b3e867e994a7cbc88efe85c95167ae49d4b7a9d.patch
> >
> > Regards,
> > Shyam
> >
> > On Fri, Jun 4, 2021 at 8:55 PM Paulo Alcantara <pc@cjr.nz> wrote:
> > >
> > > Shyam Prasad N <nspmangalore@gmail.com> writes:
> > >
> > > > @Paulo Alcantara That would be great if you can help testing my
> > > > changes. Please test with these new changes.
> > >
> > > OK.
> > >
> > > >> The super is only used for providing cifs_sb_info::origin_fullpath as key to find the corresponding failover targets in referral cache.
> > > > I'm wondering what would happen if there are multiple tcons to the
> > > > same origin_fullpath (possibly in different sessions)?
> > >
> > > That is certainly a problem, indeed.  I'm waiting for the DFS tests to
> > > finish and then send a series that contains a potential fix for that --
> > > e.g. not sharing TCP servers when mounting DFS shares.  We used to not
> > > share tcons with DFS mounts because they might contain different prefix
> > > paths but connected to same share, however that wasn't enough because
> > > multiple DFS mounts may connect to same target servers, although they
> > > might failover to completely different servers.
> > >
> > > > Also, doesn't failover targets apply to each channel under a session?
> > > > Shouldn't we switch targets on reconnect of secondary channels too?
> > >
> > > That's a interesting question.  I recall discussing this with Aurelien
> > > some time ago while running a few DFS + multichannel tests.
> > >
> > > So yes, I agree with you that when we successfully reconnect to failover
> > > target (primary channel), then we should also update all secondary
> > > channels with the new server's ip address and reconnect them.
> >
> >
> >
> > --
> > Regards,
> > Shyam
>
>
>
> --
> Thanks,
>
> Steve



-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Multichannel patches
  2021-06-05 17:57           ` Steve French
@ 2021-06-08  9:54             ` Shyam Prasad N
  0 siblings, 0 replies; 11+ messages in thread
From: Shyam Prasad N @ 2021-06-08  9:54 UTC (permalink / raw)
  To: Steve French
  Cc: Paulo Alcantara, Aurélien Aptel, CIFS, sribhat.msa, rohiths msft

Here's a revised list of patches with review comments and fixes:
https://github.com/sprasad-microsoft/smb-kernel-client/pull/7

Please consider this instead of the old patches submitted.

-------

Revised set of patches after reviews and fixes.

Made a bunch of fixes in mchan reconnect scenarios with these patches.
The reconnect codepath did not play well with mchan. Making it work
involved several changes described in the commits.

https://github.com/sprasad-microsoft/smb-kernel-client/commit/36016fc917bec3c324125c203dc1b8d58041ca26.patch
A trivial ref-counting fix in session find codepath.

https://github.com/sprasad-microsoft/smb-kernel-client/commit/6522de38368eaf11306b747e2235d75428fbf487.patch
Fixes in reconnect codepath, involving changes to introduce a
per-channel bitmask for a session, instead of a boolean, and to
reconnect the session only when we have all channels marked for
reconnect. This makes sessions more resilient in mchan scenarios.

https://github.com/sprasad-microsoft/smb-kernel-client/commit/7b591b0b9ea949fbf84bcc3676170fcb7fe8e7ac.patch
Fix to get rid of the serialization of requests during channel
binding. This involved passing the channel server pointer to all
negotiate and session setup related functions. This has increased the
diff size, otherwise the changes are quite minimal.

https://github.com/sprasad-microsoft/smb-kernel-client/commit/83b90f46e37a73bbd563e840d57f519e31f774ed.patch
Fix for getting rid of duplicate fscache cookie warnings for
multichannel scenario.

https://github.com/sprasad-microsoft/smb-kernel-client/commit/394d14080417a037d22db1812a3ca932615fdd25.patch
Adjust DebugData to print connection status of each channel.

On Sat, Jun 5, 2021 at 11:27 PM Steve French <smfrench@gmail.com> wrote:
>
> test run started with those 9 patches (8 multichannel, and Aurelien's
> small patch)
>
> http://smb3-test-rhel-75.southcentralus.cloudapp.azure.com/#/builders/11/builds/68
>
> On Sat, Jun 5, 2021 at 12:43 PM Steve French <smfrench@gmail.com> wrote:
> >
> > Had to fix two things in those recently updated patches to get them to
> > build.  You may want to fold those in as well to make the patches
> > clearer to read and bisectable
> >
> > smfrench@smfrench-ThinkPad-P52:~/cifs-2.6/fs/cifs$ git diff -a
> > diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> > index 6e4934052159..f340b7d389ef 100644
> > --- a/fs/cifs/connect.c
> > +++ b/fs/cifs/connect.c
> > @@ -260,7 +260,7 @@ cifs_reconnect(struct TCP_Server_Info *server)
> >
> >                 /* If all channels need reconnect, then tcon needs reconnect */
> >                 if (!CIFS_ALL_CHANS_NEED_RECONNECT(ses))
> > -                       goto skip_tcon_reconnect;
> > +                       continue;
> >
> >                 list_for_each(tmp2, &ses->tcon_list) {
> >                         tcon = list_entry(tmp2, struct cifs_tcon, tcon_list);
> > @@ -268,8 +268,6 @@ cifs_reconnect(struct TCP_Server_Info *server)
> >                 }
> >                 if (ses->tcon_ipc)
> >                         ses->tcon_ipc->need_reconnect = true;
> > -
> > -skip_tcon_reconnect:
> >         }
> >         spin_unlock(&cifs_tcp_ses_lock);
> >         mutex_unlock(&ses->session_mutex);
> > diff --git a/fs/cifs/sess.c b/fs/cifs/sess.c
> > index cdbd256be4e4..202d98d06606 100644
> > --- a/fs/cifs/sess.c
> > +++ b/fs/cifs/sess.c
> > @@ -92,10 +92,10 @@ unsigned long cifs_ses_get_chan_index(struct cifs_ses *ses,
> >  void cifs_chan_set_need_reconnect(struct cifs_ses *ses,
> >                                    struct TCP_Server_Info *server)
> >  {
> > -       size_t chan_index = cifs_ses_get_chan_index(ses, server);
> > +       unsigned long chan_index = cifs_ses_get_chan_index(ses, server);
> >
> >         set_bit(chan_index, &ses->chans_need_reconnect);
> > -       cifs_dbg(FYI, "Set reconnect bitmask for chan %u; now 0x%lx\n",
> > +       cifs_dbg(FYI, "Set reconnect bitmask for chan %lu; now 0x%lx\n",
> >                  chan_index, ses->chans_need_reconnect);
> >
> > On Sat, Jun 5, 2021 at 9:08 AM Shyam Prasad N <nspmangalore@gmail.com> wrote:
> > >
> > > The buildbot testing once hit a deadlock when running with the above patches.
> > > I found one possibility during cifs_reconnect, where a deadlock can occur.
> > >
> > > I've fixed that and some warnings that kernel bots have identified in
> > > the below two patches:
> > > https://github.com/sprasad-microsoft/smb-kernel-client/pull/5/commits/f3e65f72b03b03bc4b301e8e04e9babb0e9582cf.patch
> > > https://github.com/sprasad-microsoft/smb-kernel-client/pull/5/commits/7b3e867e994a7cbc88efe85c95167ae49d4b7a9d.patch
> > >
> > > Regards,
> > > Shyam
> > >
> > > On Fri, Jun 4, 2021 at 8:55 PM Paulo Alcantara <pc@cjr.nz> wrote:
> > > >
> > > > Shyam Prasad N <nspmangalore@gmail.com> writes:
> > > >
> > > > > @Paulo Alcantara That would be great if you can help testing my
> > > > > changes. Please test with these new changes.
> > > >
> > > > OK.
> > > >
> > > > >> The super is only used for providing cifs_sb_info::origin_fullpath as key to find the corresponding failover targets in referral cache.
> > > > > I'm wondering what would happen if there are multiple tcons to the
> > > > > same origin_fullpath (possibly in different sessions)?
> > > >
> > > > That is certainly a problem, indeed.  I'm waiting for the DFS tests to
> > > > finish and then send a series that contains a potential fix for that --
> > > > e.g. not sharing TCP servers when mounting DFS shares.  We used to not
> > > > share tcons with DFS mounts because they might contain different prefix
> > > > paths but connected to same share, however that wasn't enough because
> > > > multiple DFS mounts may connect to same target servers, although they
> > > > might failover to completely different servers.
> > > >
> > > > > Also, doesn't failover targets apply to each channel under a session?
> > > > > Shouldn't we switch targets on reconnect of secondary channels too?
> > > >
> > > > That's a interesting question.  I recall discussing this with Aurelien
> > > > some time ago while running a few DFS + multichannel tests.
> > > >
> > > > So yes, I agree with you that when we successfully reconnect to failover
> > > > target (primary channel), then we should also update all secondary
> > > > channels with the new server's ip address and reconnect them.
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Shyam
> >
> >
> >
> > --
> > Thanks,
> >
> > Steve
>
>
>
> --
> Thanks,
>
> Steve



-- 
Regards,
Shyam

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-06-08  9:54 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-02 12:43 Multichannel patches Shyam Prasad N
2021-06-02 15:57 ` Aurélien Aptel
2021-06-02 18:55   ` Steve French
2021-06-02 16:45 ` Paulo Alcantara
2021-06-04  9:13   ` Shyam Prasad N
2021-06-04 15:10     ` Steve French
2021-06-04 15:25     ` Paulo Alcantara
2021-06-05 14:08       ` Shyam Prasad N
2021-06-05 17:43         ` Steve French
2021-06-05 17:57           ` Steve French
2021-06-08  9:54             ` Shyam Prasad N

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).