linux-cifs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Kernel lockup upon dfs_cache_update
@ 2021-08-10 13:56 Thomas Werschlein
  2021-08-10 15:02 ` Paulo Alcantara
  0 siblings, 1 reply; 3+ messages in thread
From: Thomas Werschlein @ 2021-08-10 13:56 UTC (permalink / raw)
  To: linux-cifs

Hi

We are mounting SMB shares from Windows and FreeBSD servers to AD-joined (Ubuntu) Linux clients, using DFS and Kerberos.

A sample /etc/fstab entry looks like this:
//files.d.example.com/shared /files/shared cifs defaults,sec=krb5i,vers=3,seal,multiuser,noserverino,username=LINUXCLIENT$ 0 0

The machine account LINUXCLIENT$ is able to browse the DFS tree and mount the shares.
The credentials for the principal LINUXCLIENT$@D.EXAMPLE.COM are stored in the default keytab (/etc/krb5.keytab).
In order to access a share a user needs a personal KRB ticket (either through kinit or GSSAPI delegated credentials upon ssh login).

This works fine on Ubuntu 18.04 (4.15.0-153-generic Kernel, 2.10 cifs.ko), but doesn't on Ubuntu 20.04 (5.4.0-80-generic, 2.23).
To be precise: it works for the first 10 hours (the ticket lifetime) even on Ubuntu 20.04.

Consider this freshly booted linux host:

root@linuxclient:~# uptime -s
2021-08-09 18:11:29

Approximately 10 hours later, the kernel locks itself up with these messages in kern.log (server names and IPs redacted):

Aug 10 04:15:59 linuxclient kernel: [36115.522194] fs/cifs/connect.c: Existing tcp session with server found
Aug 10 04:15:59 linuxclient kernel: [36115.522197] fs/cifs/dfs_cache.c: CIFS VFS: in do_refresh_tcon as Xid: 398 with uid: 0
Aug 10 04:15:59 linuxclient kernel: [36115.522201] fs/cifs/smb2ops.c: smb2_get_dfs_refer: path: \files.d.example.com\shared
Aug 10 04:15:59 linuxclient kernel: [36115.522206] fs/cifs/smb2pdu.c: SMB2 IOCTL
Aug 10 04:15:59 linuxclient kernel: [36115.522234] fs/cifs/smb2ops.c: Encrypt message returned 0
Aug 10 04:15:59 linuxclient kernel: [36115.522246] fs/cifs/transport.c: Sending smb: smb_len=248
Aug 10 04:15:59 linuxclient kernel: [36115.522816] fs/cifs/connect.c: RFC1002 header 0x7d
Aug 10 04:15:59 linuxclient kernel: [36115.522826] fs/cifs/smb2ops.c: Decrypt message returned 0
Aug 10 04:15:59 linuxclient kernel: [36115.522828] fs/cifs/smb2ops.c: mid found
Aug 10 04:15:59 linuxclient kernel: [36115.522830] fs/cifs/smb2misc.c: SMB2 data length 0 offset 0
Aug 10 04:15:59 linuxclient kernel: [36115.522831] fs/cifs/smb2misc.c: SMB2 len 73
Aug 10 04:15:59 linuxclient kernel: [36115.522832] fs/cifs/smb2ops.c: Session expired or deleted
Aug 10 04:15:59 linuxclient kernel: [36115.522836] fs/cifs/dfs_cache.c: do_dfs_cache_find: search path: \files.d.example.com\shared
Aug 10 04:15:59 linuxclient kernel: [36115.522840] fs/cifs/connect.c: cifs_reconnect: will retry 3 target(s)
Aug 10 04:15:59 linuxclient kernel: [36115.522841] fs/cifs/connect.c: Mark tcp session as need reconnect
Aug 10 04:15:59 linuxclient kernel: [36115.522841] fs/cifs/connect.c: cifs_reconnect: marking sessions and tcons for reconnect
Aug 10 04:15:59 linuxclient kernel: [36115.522842] fs/cifs/connect.c: cifs_reconnect: tearing down socket
Aug 10 04:15:59 linuxclient kernel: [36115.522844] fs/cifs/connect.c: State: 0x3 Flags: 0x0
Aug 10 04:15:59 linuxclient kernel: [36115.522850] fs/cifs/connect.c: Post shutdown state: 0x3 Flags: 0x0
Aug 10 04:15:59 linuxclient kernel: [36115.522857] fs/cifs/connect.c: cifs_reconnect: moving mids to private list
Aug 10 04:15:59 linuxclient kernel: [36115.522858] fs/cifs/connect.c: cifs_reconnect: issuing mid callbacks
Aug 10 04:15:59 linuxclient kernel: [36115.522862] fs/cifs/connect.c: reconn_inval_dfs_target: UNC: \\files.d.example.com\shared
Aug 10 04:15:59 linuxclient kernel: [36115.522865] fs/cifs/dns_resolve.c: dns_resolve_server_name_to_ip: probably server name is whole unc: \\files.d.example.com
Aug 10 04:15:59 linuxclient kernel: [36115.522873] fs/cifs/dns_resolve.c: dns_resolve_server_name_to_ip: resolved: files.d.example.com to 10.10.10.10
Aug 10 04:15:59 linuxclient kernel: [36115.522881] fs/cifs/connect.c: Socket created
Aug 10 04:15:59 linuxclient kernel: [36115.522882] fs/cifs/connect.c: sndbuf 16384 rcvbuf 131072 rcvtimeo 0x6d6
Aug 10 04:15:59 linuxclient kernel: [36115.522905] fs/cifs/transport.c: cifs_sync_mid_result: cmd=11 mid=698 state=8
Aug 10 04:15:59 linuxclient kernel: [36115.522907] fs/cifs/smb2pdu.c: SMB2 IOCTL
Aug 10 04:15:59 linuxclient kernel: [36115.523407] fs/cifs/smb2ops.c: set credits to 1 due to smb2 reconnect
Aug 10 04:15:59 linuxclient kernel: [36115.523409] fs/cifs/dfs_cache.c: dfs_cache_noreq_update_tgthint: path: \files.d.example.com\shared
Aug 10 04:15:59 linuxclient kernel: [36115.523410] fs/cifs/dfs_cache.c: do_dfs_cache_find: search path: \files.d.example.com\shared
Aug 10 04:15:59 linuxclient kernel: [36115.523412] fs/cifs/dfs_cache.c: dfs_cache_update_vol: fullpath: \\files.d.example.com\shared
Aug 10 04:16:09 linuxclient kernel: [36125.762187] fs/cifs/smb2pdu.c: Negotiate protocol
Aug 10 04:16:09 linuxclient kernel: [36125.762201] fs/cifs/transport.c: Sending smb: smb_len=108
Aug 10 04:16:15 linuxclient kernel: [36131.906173] fs/cifs/smb2pdu.c: In echo request
Aug 10 04:16:15 linuxclient kernel: [36131.906196] fs/cifs/transport.c: Sending smb: smb_len=72
Aug 10 04:16:15 linuxclient kernel: [36131.906213] fs/cifs/smb2pdu.c: In echo request
Aug 10 04:16:15 linuxclient kernel: [36131.906214] fs/cifs/smb2pdu.c: Need negotiate, reconnecting tcons
Aug 10 04:16:15 linuxclient kernel: [36131.907034] fs/cifs/connect.c: RFC1002 header 0x44
Aug 10 04:16:15 linuxclient kernel: [36131.907041] fs/cifs/smb2misc.c: SMB2 len 68
Aug 10 04:16:15 linuxclient kernel: [36131.907044] fs/cifs/smb2ops.c: add 1 credits total=2353
Aug 10 04:17:16 linuxclient kernel: [36193.346172] fs/cifs/smb2pdu.c: In echo request
Aug 10 04:17:16 linuxclient kernel: [36193.346174] fs/cifs/smb2pdu.c: In echo request
Aug 10 04:17:16 linuxclient kernel: [36193.346199] fs/cifs/transport.c: Sending smb: smb_len=72
Aug 10 04:17:16 linuxclient kernel: [36193.347134] fs/cifs/connect.c: RFC1002 header 0x44
Aug 10 04:17:16 linuxclient kernel: [36193.347140] fs/cifs/smb2misc.c: SMB2 len 68
Aug 10 04:17:16 linuxclient kernel: [36193.347143] fs/cifs/smb2ops.c: add 1 credits total=2353
Aug 10 04:18:14 linuxclient kernel: [36250.690180] INFO: task cifsd:644 blocked for more than 120 seconds.
Aug 10 04:18:14 linuxclient kernel: [36250.690208]       Not tainted 5.4.0-80-generic #90-Ubuntu
Aug 10 04:18:14 linuxclient kernel: [36250.690227] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 10 04:18:14 linuxclient kernel: [36250.690253] cifsd           D    0   644      2 0x80004002
Aug 10 04:18:14 linuxclient kernel: [36250.690255] Call Trace:
Aug 10 04:18:14 linuxclient kernel: [36250.690263]  __schedule+0x2e3/0x740
Aug 10 04:18:14 linuxclient kernel: [36250.690267]  ? vprintk_default+0x29/0x50
Aug 10 04:18:14 linuxclient kernel: [36250.690270]  schedule+0x42/0xb0
Aug 10 04:18:14 linuxclient kernel: [36250.690272]  schedule_preempt_disabled+0xe/0x10
Aug 10 04:18:14 linuxclient kernel: [36250.690274]  __mutex_lock.isra.0+0x178/0x4d0
Aug 10 04:18:14 linuxclient kernel: [36250.690276]  ? _raw_spin_lock_bh+0x30/0x30
Aug 10 04:18:14 linuxclient kernel: [36250.690278]  __mutex_lock_slowpath+0x13/0x20
Aug 10 04:18:14 linuxclient kernel: [36250.690279]  mutex_lock+0x2e/0x40
Aug 10 04:18:14 linuxclient kernel: [36250.690308]  dfs_cache_update_vol+0x41/0x2b0 [cifs]
Aug 10 04:18:14 linuxclient kernel: [36250.690320]  cifs_reconnect+0x623/0xdb0 [cifs]
Aug 10 04:18:14 linuxclient kernel: [36250.690332]  cifs_handle_standard+0x18e/0x1b0 [cifs]
Aug 10 04:18:14 linuxclient kernel: [36250.690347]  smb3_receive_transform+0x2b5/0x400 [cifs]
Aug 10 04:18:14 linuxclient kernel: [36250.690359]  cifs_demultiplex_thread+0x778/0xcd0 [cifs]
Aug 10 04:18:14 linuxclient kernel: [36250.690360]  ? __schedule+0x2eb/0x740
Aug 10 04:18:14 linuxclient kernel: [36250.690363]  kthread+0x104/0x140
Aug 10 04:18:14 linuxclient kernel: [36250.690375]  ? cifs_handle_standard+0x1b0/0x1b0 [cifs]
Aug 10 04:18:14 linuxclient kernel: [36250.690376]  ? kthread_park+0x90/0x90
Aug 10 04:18:14 linuxclient kernel: [36250.690377]  ret_from_fork+0x35/0x40
Aug 10 04:18:18 linuxclient kernel: [36254.786170] fs/cifs/smb2pdu.c: In echo request
Aug 10 04:18:18 linuxclient kernel: [36254.786197] fs/cifs/transport.c: Sending smb: smb_len=72
Aug 10 04:18:18 linuxclient kernel: [36254.786218] fs/cifs/smb2pdu.c: In echo request
Aug 10 04:18:18 linuxclient kernel: [36254.786961] fs/cifs/connect.c: RFC1002 header 0x44
Aug 10 04:18:18 linuxclient kernel: [36254.786967] fs/cifs/smb2misc.c: SMB2 len 68
Aug 10 04:18:18 linuxclient kernel: [36254.786970] fs/cifs/smb2ops.c: add 1 credits total=2353
Aug 10 04:19:19 linuxclient kernel: [36316.226158] fs/cifs/smb2pdu.c: In echo request
Aug 10 04:19:19 linuxclient kernel: [36316.226162] fs/cifs/smb2pdu.c: In echo request
Aug 10 04:19:19 linuxclient kernel: [36316.226186] fs/cifs/transport.c: Sending smb: smb_len=72
Aug 10 04:19:19 linuxclient kernel: [36316.227037] fs/cifs/connect.c: RFC1002 header 0x44
Aug 10 04:19:19 linuxclient kernel: [36316.227045] fs/cifs/smb2misc.c: SMB2 len 68
Aug 10 04:19:19 linuxclient kernel: [36316.227049] fs/cifs/smb2ops.c: add 1 credits total=2353

My findings so far:
- this lockup (after 10h) *only* occurs when DFS is involved. Mounting a target share directly "survives" the ticket expiry
- this problem does *not* manifest with the newest kernel/cifs.ko combo (5.14.0-051400rc5-generic/2.33)

Therefore:
Could it be, that the DFS patches contributed by Paulo Alcantara on June 4 2021 (https://www.spinics.net/lists/linux-cifs/msg21999.html)
fixed this problem?

Two of his comments point into that direction: 
  [...]
  - keep SMB sessions alive as long as dfs mounts are actives in order
    to refresh cached entries by using IPC tcons.
  [...]
  - skip unnecessary tree disconnect of IPCs when shutting down SMB
    sessions (it didn't even work before).

Am I on the right track here? And if so: are there any plans to backport Paulo's patches to current kernels?

Thanks a lot & best regards
Thomas


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Kernel lockup upon dfs_cache_update
  2021-08-10 13:56 Kernel lockup upon dfs_cache_update Thomas Werschlein
@ 2021-08-10 15:02 ` Paulo Alcantara
  2021-08-27 12:37   ` Thomas Werschlein
  0 siblings, 1 reply; 3+ messages in thread
From: Paulo Alcantara @ 2021-08-10 15:02 UTC (permalink / raw)
  To: Thomas Werschlein, linux-cifs

Thomas Werschlein <thomas.werschlein@geo.uzh.ch> writes:

> Hi
>
> We are mounting SMB shares from Windows and FreeBSD servers to AD-joined (Ubuntu) Linux clients, using DFS and Kerberos.
>
> A sample /etc/fstab entry looks like this:
> //files.d.example.com/shared /files/shared cifs defaults,sec=krb5i,vers=3,seal,multiuser,noserverino,username=LINUXCLIENT$ 0 0
>
> The machine account LINUXCLIENT$ is able to browse the DFS tree and mount the shares.
> The credentials for the principal LINUXCLIENT$@D.EXAMPLE.COM are stored in the default keytab (/etc/krb5.keytab).
> In order to access a share a user needs a personal KRB ticket (either through kinit or GSSAPI delegated credentials upon ssh login).
>
> This works fine on Ubuntu 18.04 (4.15.0-153-generic Kernel, 2.10 cifs.ko), but doesn't on Ubuntu 20.04 (5.4.0-80-generic, 2.23).
> To be precise: it works for the first 10 hours (the ticket lifetime) even on Ubuntu 20.04.
>
> My findings so far:
> - this lockup (after 10h) *only* occurs when DFS is involved. Mounting a target share directly "survives" the ticket expiry
> - this problem does *not* manifest with the newest kernel/cifs.ko combo (5.14.0-051400rc5-generic/2.33)
>
> Therefore:
> Could it be, that the DFS patches contributed by Paulo Alcantara on June 4 2021 (https://www.spinics.net/lists/linux-cifs/msg21999.html)
> fixed this problem?

Yes, that is possible.  We found out some deadlock and race scenarios
but couldn't remember if the above was one of them.  Still worth having
those commits backported.

>
> Two of his comments point into that direction: 
>   [...]
>   - keep SMB sessions alive as long as dfs mounts are actives in order
>     to refresh cached entries by using IPC tcons.
>   [...]
>   - skip unnecessary tree disconnect of IPCs when shutting down SMB
>     sessions (it didn't even work before).
>
> Am I on the right track here? And if so: are there any plans to backport Paulo's patches to current kernels?

Yes.  I don't know if there any plans, but the only tricky part is
changing the commits to work with the old mount api on <5.11 kernels.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Kernel lockup upon dfs_cache_update
  2021-08-10 15:02 ` Paulo Alcantara
@ 2021-08-27 12:37   ` Thomas Werschlein
  0 siblings, 0 replies; 3+ messages in thread
From: Thomas Werschlein @ 2021-08-27 12:37 UTC (permalink / raw)
  To: Paulo Alcantara; +Cc: linux-cifs


> On Tue, Aug 10, 2021 at 5:02 PM Paulo Alcantara <pc@cjr.nz> wrote:
> 
> Thomas Werschlein <thomas.werschlein@geo.uzh.ch> writes:
> 
>> Hi
>> 
>> We are mounting SMB shares from Windows and FreeBSD servers to AD-joined (Ubuntu) Linux clients, using DFS and Kerberos.
>> 
>> A sample /etc/fstab entry looks like this:
>> //files.d.example.com/shared /files/shared cifs defaults,sec=krb5i,vers=3,seal,multiuser,noserverino,username=LINUXCLIENT$ 0 0
>> 
>> The machine account LINUXCLIENT$ is able to browse the DFS tree and mount the shares.
>> The credentials for the principal LINUXCLIENT$@D.EXAMPLE.COM are stored in the default keytab (/etc/krb5.keytab).
>> In order to access a share a user needs a personal KRB ticket (either through kinit or GSSAPI delegated credentials upon ssh login).
>> 
>> This works fine on Ubuntu 18.04 (4.15.0-153-generic Kernel, 2.10 cifs.ko), but doesn't on Ubuntu 20.04 (5.4.0-80-generic, 2.23).
>> To be precise: it works for the first 10 hours (the ticket lifetime) even on Ubuntu 20.04.
>> 
>> My findings so far:
>> - this lockup (after 10h) *only* occurs when DFS is involved. Mounting a target share directly "survives" the ticket expiry
>> - this problem does *not* manifest with the newest kernel/cifs.ko combo (5.14.0-051400rc5-generic/2.33)
>> 
>> Therefore:
>> Could it be, that the DFS patches contributed by Paulo Alcantara on June 4 2021 (https://www.spinics.net/lists/linux-cifs/msg21999.html)
>> fixed this problem?
> 
> Yes, that is possible.  We found out some deadlock and race scenarios
> but couldn't remember if the above was one of them.  Still worth having
> those commits backported.

Thanks for your (possible) confirmation.

> 
>> 
>> Two of his comments point into that direction: 
>>  [...]
>>  - keep SMB sessions alive as long as dfs mounts are actives in order
>>    to refresh cached entries by using IPC tcons.
>>  [...]
>>  - skip unnecessary tree disconnect of IPCs when shutting down SMB
>>    sessions (it didn't even work before).
>> 
>> Am I on the right track here? And if so: are there any plans to backport Paulo's patches to current kernels?
> 
> Yes.  I don't know if there any plans, but the only tricky part is
> changing the commits to work with the old mount api on <5.11 kernels.

I wasn't aware of Ubuntu's "LTS enablement stack" (https://wiki.ubuntu.com/Kernel/LTSEnablementStack).
A simple 'sudo apt install --install-recommends linux-generic-hwe-20.04' brings the latest LTS to Kernel 5.11.0-27 and cifs.ko to 2.30.

Strange enough (but gratefully so) this did solve my problem - the lockups don't manifest anymore.
I wouldn't have expected this, since your patches from June 4 brought cifs.ko to 2.31. Must be some earlier change then?

Anyway, our problem is gone. Thanks again.

Best regards
Thomas

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-08-27 12:37 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-10 13:56 Kernel lockup upon dfs_cache_update Thomas Werschlein
2021-08-10 15:02 ` Paulo Alcantara
2021-08-27 12:37   ` Thomas Werschlein

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).