linux-cifs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Many processes end up in uninterruptible sleep accessing cifs mounts
@ 2019-07-03 18:12 Martijn de Gouw
  2019-07-03 19:17 ` Aurélien Aptel
  0 siblings, 1 reply; 16+ messages in thread
From: Martijn de Gouw @ 2019-07-03 18:12 UTC (permalink / raw)
  To: linux-cifs

Hi,

On our production servers, we have a lot of issues with cifs mounts.
All mounts are mounted via the dfs shares on our domain controller.
We have mounts using sec=krb5, sec=ntlmssp and sec=krb5,multiuser

All mounts are vers=3.0.

One of the symptoms is that our monitoring system complains about not 
being able to stat() every now and then, the next scraping cycle, stat() 
works again. Even when the mounts are not accesses at all.

Also, lot of applications get stuck on either accessing data on the 
mounts, or performing stat() like operations on the mounts.

For us, the worst part is that applications end up in 'D'. The number of 
'D' processes pile up really quickly, blocking users from performing 
their work.

We are running Linux 4.20.17 SMP PREEMPT on all machines. We tried 
upgrading to > 5.x, but caused even more problems and kernel hangs.

I do not really have a clue where to start debugging. I enabled kernel 
debug options suggested on the wiki, but the amount of logging is 
immense now.

Can you provide any pointers where to look or start debugging?
Or any help on how to kill those D processes and get our Linux servers 
stable again?

Regards, Martijn de Gouw
-- 
Martijn de Gouw
Designer
Prodrive Technologies
Mobile: +31 63 17 76 161
Phone:  +31 40 26 76 200

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Many processes end up in uninterruptible sleep accessing cifs mounts
  2019-07-03 18:12 Many processes end up in uninterruptible sleep accessing cifs mounts Martijn de Gouw
@ 2019-07-03 19:17 ` Aurélien Aptel
  2019-07-03 19:24   ` Steve French
       [not found]   ` <1fc4f6d0-6cdc-69a5-4359-23484d6bdfc9@prodrive-technologies.com>
  0 siblings, 2 replies; 16+ messages in thread
From: Aurélien Aptel @ 2019-07-03 19:17 UTC (permalink / raw)
  To: Martijn de Gouw, linux-cifs

Hi Martijn,

Martijn de Gouw <martijn.de.gouw@prodrive-technologies.com> writes:
> One of the symptoms is that our monitoring system complains about not 
> being able to stat() every now and then, the next scraping cycle, stat() 
> works again. Even when the mounts are not accesses at all.
>
> Also, lot of applications get stuck on either accessing data on the 
> mounts, or performing stat() like operations on the mounts.
>
> For us, the worst part is that applications end up in 'D'. The number of 
> 'D' processes pile up really quickly, blocking users from performing 
> their work.

Gah, sorry to hear. Thanks for the report.

If you could pin down a specific way to reproduce some of those hangs
that would be of great help.

> We are running Linux 4.20.17 SMP PREEMPT on all machines. We tried 
> upgrading to > 5.x, but caused even more problems and kernel hangs.

5.0 and 5.1 really fixed a lot of issues with credits and reconnection.

> I do not really have a clue where to start debugging. I enabled kernel 
> debug options suggested on the wiki, but the amount of logging is 
> immense now.

Yes that is normal.

> Can you provide any pointers where to look or start debugging?
> Or any help on how to kill those D processes and get our Linux servers 
> stable again?

Are there any kernel oops/panic with stack traces and register dumps in
the log?

You can inspect the kernel stack trace of the hung processes (to see where
they are stuck) by printing the file /proc/<pid>/stack.

Cheers,
-- 
Aurélien Aptel / SUSE Labs Samba Team
GPG: 1839 CB5F 9F5B FB9B AA97  8C99 03C8 A49B 521B D5D3
SUSE Linux GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Many processes end up in uninterruptible sleep accessing cifs mounts
  2019-07-03 19:17 ` Aurélien Aptel
@ 2019-07-03 19:24   ` Steve French
       [not found]   ` <1fc4f6d0-6cdc-69a5-4359-23484d6bdfc9@prodrive-technologies.com>
  1 sibling, 0 replies; 16+ messages in thread
From: Steve French @ 2019-07-03 19:24 UTC (permalink / raw)
  To: Aurélien Aptel; +Cc: Martijn de Gouw, CIFS

Normally dynamic tracing is much easier and less noisy since you can
selectively turn on and off tracing with it (especially easier now
that good tutorials on how to use eBPF to make it even easier).

"trace-cmd record -e cifs"
then repro the problem
then in another process "trace-cmd show"

There are 70 events that can be individually turned on or off in
         /sys/kernel/debug/tracing/events/cifs

e.g. "trace-cmd record -e smb3_cmd_err -e smb3_enter -e smb3_exit_err"

Also can look at /proc/fs/cifs/Stats which shows how many of each
command and how many errors and if any reconnects

On Wed, Jul 3, 2019 at 2:18 PM Aurélien Aptel <aaptel@suse.com> wrote:
>
> Hi Martijn,
>
> Martijn de Gouw <martijn.de.gouw@prodrive-technologies.com> writes:
> > One of the symptoms is that our monitoring system complains about not
> > being able to stat() every now and then, the next scraping cycle, stat()
> > works again. Even when the mounts are not accesses at all.
> >
> > Also, lot of applications get stuck on either accessing data on the
> > mounts, or performing stat() like operations on the mounts.
> >
> > For us, the worst part is that applications end up in 'D'. The number of
> > 'D' processes pile up really quickly, blocking users from performing
> > their work.
>
> Gah, sorry to hear. Thanks for the report.
>
> If you could pin down a specific way to reproduce some of those hangs
> that would be of great help.
>
> > We are running Linux 4.20.17 SMP PREEMPT on all machines. We tried
> > upgrading to > 5.x, but caused even more problems and kernel hangs.
>
> 5.0 and 5.1 really fixed a lot of issues with credits and reconnection.
>
> > I do not really have a clue where to start debugging. I enabled kernel
> > debug options suggested on the wiki, but the amount of logging is
> > immense now.
>
> Yes that is normal.
>
> > Can you provide any pointers where to look or start debugging?
> > Or any help on how to kill those D processes and get our Linux servers
> > stable again?
>
> Are there any kernel oops/panic with stack traces and register dumps in
> the log?
>
> You can inspect the kernel stack trace of the hung processes (to see where
> they are stuck) by printing the file /proc/<pid>/stack.
>
> Cheers,
> --
> Aurélien Aptel / SUSE Labs Samba Team
> GPG: 1839 CB5F 9F5B FB9B AA97  8C99 03C8 A49B 521B D5D3
> SUSE Linux GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
> GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg)



-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Many processes end up in uninterruptible sleep accessing cifs mounts
       [not found]   ` <1fc4f6d0-6cdc-69a5-4359-23484d6bdfc9@prodrive-technologies.com>
@ 2019-07-04 11:22     ` Aurélien Aptel
  2019-07-04 13:36       ` Martijn de Gouw
  2019-07-04 21:08       ` Pavel Shilovsky
  0 siblings, 2 replies; 16+ messages in thread
From: Aurélien Aptel @ 2019-07-04 11:22 UTC (permalink / raw)
  To: Martijn de Gouw, linux-cifs

Martijn de Gouw <martijn.de.gouw@prodrive-technologies.com> writes:
>> Are there any kernel oops/panic with stack traces and register dumps in
>> the log?
>> 
>> You can inspect the kernel stack trace of the hung processes (to see where
>> they are stuck) by printing the file /proc/<pid>/stack.
>
> These are the stacks of all processes that are D, most of them being df.
> I also attached the cifs Stats output below.

Ok thanks. What about Oops or BUG or panic in dmesg logs, did you see
any?

The individual stack dumps are pretty useful. Here is my theory:

> pid: 9505
> syscall: 4 0x56550a2ec470 0x7ffede42e9a0 0x7ffede42e9a0 0x83a 0x3 0x20 
> 0x7ffede42e8f8 0x7f7f8928f295
> [<0>] open_shroot+0x43/0x200 [cifs]
> [<0>] smb2_query_path_info+0x93/0x220 [cifs]

Almost all of the processes have the same stack trace. They are stuck at
open_shroot()+0x43 which is probably

    mutex_lock(&tcon->crfid.fid_mutex); 

Then there are only 2 other processes stuck somewhere in the same code path
(open_shroot) but deeper, meaning they have the locks that the other
processes are waiting for:


> pid: 22858
> syscall: 4 0x564b46285d10 0x7ffcea3f9a80 0x7ffcea3f9a80 0x83a 0x3 0x20 
> 0x7ffcea3f99d8 0x7f6cc78c7295
> [<0>] cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
> [<0>] smb2_reconnect+0x2d6/0x4b0 [cifs]
> [<0>] smb2_plain_req_init+0x30/0x240 [cifs]
> [<0>] SMB2_open_init+0x6d/0x7c0 [cifs]
> [<0>] SMB2_open+0x150/0x520 [cifs]
> [<0>] open_shroot+0x12f/0x200 [cifs]
> [<0>] smb2_query_path_info+0x93/0x220 [cifs]
> [<0>] cifs_get_inode_info+0x580/0xb10 [cifs]
> [<0>] cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
> [<0>] cifs_getattr+0x5b/0x1b0 [cifs]
> [<0>] vfs_statx+0x89/0xe0
> [<0>] __do_sys_newstat+0x39/0x70
> [<0>] do_syscall_64+0x55/0x100
> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [<0>] 0xffffffffffffffff


> pid: 20027
> syscall: 4 0x55a3c7d767d0 0x7ffe51432ab0 0x7ffe51432ab0 0x83a 
> 0x55a3c7d75c40 0x20 0x7ffe51432a08 0x7f5f7c4e7295
> [<0>] cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
> [<0>] smb2_reconnect+0x2d6/0x4b0 [cifs]
> [<0>] smb2_plain_req_init+0x30/0x240 [cifs]
> [<0>] SMB2_open_init+0x6d/0x7c0 [cifs]
> [<0>] SMB2_open+0x150/0x520 [cifs]
> [<0>] open_shroot+0x12f/0x200 [cifs]
> [<0>] smb2_query_path_info+0x93/0x220 [cifs]
> [<0>] cifs_get_inode_info+0x580/0xb10 [cifs]
> [<0>] cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
> [<0>] cifs_getattr+0x5b/0x1b0 [cifs]
> [<0>] vfs_statx+0x89/0xe0
> [<0>] __do_sys_newstat+0x39/0x70
> [<0>] do_syscall_64+0x55/0x100
> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [<0>] 0xffffffffffffffff

Due to timeouts maybe the Open request needs
to reconnect the server/ses/tcon and to do this it calls
cifs_mark_open_files_invalid() and gets stuck somewhere there.

        spin_lock(&tcon->open_file_lock);
        list_for_each_safe(tmp, tmp1, &tcon->openFileList) {
                open_file = list_entry(tmp, struct cifsFileInfo, tlist);
                open_file->invalidHandle = true;
                open_file->oplock_break_cancelled = true;
        }
        spin_unlock(&tcon->open_file_lock);

        mutex_lock(&tcon->crfid.fid_mutex); <=== most likely here
        tcon->crfid.is_valid = false;
        memset(tcon->crfid.fid, 0, sizeof(struct cifs_fid));
        mutex_unlock(&tcon->crfid.fid_mutex);

I think these processes are trying to lock the same lock twice: one in
open_shroot() and since Open ends up having to reconnect, once again in
mark_open_files_invalid(). I think it's the same lock because I don't
see why the tcon pointers would be different in those 2 spots. Kernel
mutexes are not reentrant so this is a deadlock.

Cheers,
-- 
Aurélien Aptel / SUSE Labs Samba Team
GPG: 1839 CB5F 9F5B FB9B AA97  8C99 03C8 A49B 521B D5D3
SUSE Linux GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Many processes end up in uninterruptible sleep accessing cifs mounts
  2019-07-04 11:22     ` Aurélien Aptel
@ 2019-07-04 13:36       ` Martijn de Gouw
  2019-07-04 21:34         ` Pavel Shilovsky
  2019-07-04 21:08       ` Pavel Shilovsky
  1 sibling, 1 reply; 16+ messages in thread
From: Martijn de Gouw @ 2019-07-04 13:36 UTC (permalink / raw)
  To: Aurélien Aptel, linux-cifs

Hi,

On 04-07-2019 13:22, Aurélien Aptel wrote:
> Martijn de Gouw <martijn.de.gouw@prodrive-technologies.com> writes:
>>> Are there any kernel oops/panic with stack traces and register dumps in
>>> the log?
>>>
>>> You can inspect the kernel stack trace of the hung processes (to see where
>>> they are stuck) by printing the file /proc/<pid>/stack.
>>
>> These are the stacks of all processes that are D, most of them being df.
>> I also attached the cifs Stats output below.
> 
> Ok thanks. What about Oops or BUG or panic in dmesg logs, did you see
> any?

I did find the following messages in the dmesg of one of the servers:

[    4.797893] FS-Cache: Duplicate cookie detected
[    4.797915] FS-Cache: O-cookie c=000000001a791554 [p=00000000fb6f31ee fl=222 nc=0 na=1]
[    4.797934] FS-Cache: O-cookie d=0000000020a06fab n=00000000654600e7
[    4.797949] FS-Cache: O-key=[8] '020001bd0a010102'
[    4.797963] FS-Cache: N-cookie c=000000005d0bf4a5 [p=00000000fb6f31ee fl=2 nc=0 na=1]
[    4.797982] FS-Cache: N-cookie d=0000000020a06fab n=000000004e1e47aa
[    4.797997] FS-Cache: N-key=[8] '020001bd0a010102'
[    4.798643] FS-Cache: Duplicate cookie detected
[    4.798659] FS-Cache: O-cookie c=000000001a791554 [p=00000000fb6f31ee fl=222 nc=0 na=1]
[    4.798679] FS-Cache: O-cookie d=0000000020a06fab n=00000000654600e7
[    4.798695] FS-Cache: O-key=[8] '020001bd0a010102'
[    4.798714] FS-Cache: N-cookie c=00000000cbe44971 [p=00000000fb6f31ee fl=2 nc=0 na=1]
[    4.798733] FS-Cache: N-cookie d=0000000020a06fab n=00000000ab0e78a6
[    4.798747] FS-Cache: N-key=[8] '020001bd0a010102'
[    4.906667] FS-Cache: Netfs 'nfs' registered for caching
[12738.729173] CIFS VFS: Send error in SessSetup = -126
[99125.480751] CIFS VFS: Send error in SessSetup = -126
[185517.295175] CIFS VFS: Send error in SessSetup = -126
[250515.749714] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[250515.749740] BUG: unable to handle kernel paging request at ffff8ec52a6fe1d0
[250515.749757] PGD 1b2602067 P4D 1b2602067 PUD 42dbff063 PMD 42a357063 PTE 800000042a6fe063
[250515.749779] Oops: 0011 [#1] PREEMPT SMP PTI
[250515.749792] CPU: 1 PID: 796 Comm: cifsd Tainted: G            E     4.20.17-pd-4.20.y #20190611
[250515.749812] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/19/2018
[250515.749844] RIP: 0010:0xffff8ec52a6fe1d0
[250515.749860] Code: ff ff 00 00 00 00 fd 01 00 00 00 7d 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> e1 6f 2a c5 8e ff ff d0 e1 6f 2a c5 8e ff ff 80 70 82 2b c5 8e
[250515.749914] RSP: 0018:ffff9e60c2e2bdc0 EFLAGS: 00010202
[250515.749927] RAX: ffff8ec52a6fe1d0 RBX: ffff8ec526fce800 RCX: dead000000000200
[250515.749944] RDX: ffff8ec52d3e6800 RSI: 0000000000000246 RDI: ffff8ec52a6fe198
[250515.749962] RBP: ffff8ec52d3e6800 R08: 0000000000000002 R09: ffff8ec52a6fe170
[250515.749979] R10: ffff9e60c2e2bc10 R11: ffff8ec527088000 R12: ffff8ec526fce9c0
[250515.749997] R13: ffff8ec52d3e6800 R14: ffff9e60c2e2bdd8 R15: ffff8ec52d3e6b80
[250515.750014] FS:  0000000000000000(0000) GS:ffff8ec52fa80000(0000) knlGS:0000000000000000
[250515.750033] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[250515.750048] CR2: ffff8ec52a6fe1d0 CR3: 000000042a53c005 CR4: 00000000003606e0
[250515.750100] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[250515.750102] CIFS VFS: No task to wake, unknown frame received! NumMids 3
[250515.750119] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[250515.750120] Call Trace:
[250515.750149] 00000000: 424d53fe 00010040 00000000 00000005  .SMB@...........
[250515.750193]  ? cifs_reconnect+0x337/0x880 [cifs]
[250515.750201]  ? cifs_handle_standard+0x169/0x190 [cifs]
[250515.750216] 00000010: 10000009 00000098 00000db2 00000000  ................
[250515.750234]  ? cifs_demultiplex_thread+0x9e5/0xbc0 [cifs]
[250515.750240] 00000020: 0000025c 00000001 640002e9 0000c81d  \..........d....
[250515.750259]  ? finish_task_switch+0x7d/0x290
[250515.750271] 00000030: 5a9e63f3 6204a9b0 587058e3 0d45419b  .c.Z...b.XpX.AE.
[250515.750295]  ? cifs_handle_standard+0x190/0x190 [cifs]
[250515.750728] CIFS VFS: No task to wake, unknown frame received! NumMids 3
[250515.751200]  ? kthread+0xf8/0x130
[250515.751639] 00000000: 424d53fe 00010040 00000000 00000010  .SMB@...........
[250515.752102]  ? kthread_create_worker_on_cpu+0x70/0x70
[250515.752554] 00000010: 0000000d 00000068 00000db3 00000000  ....h...........
[250515.752997]  ? ret_from_fork+0x35/0x40
[250515.753429] 00000020: 0000025c 00000001 640002e9 0000c81d  \..........d....
[250515.753874] Modules linked in: cpufreq_userspace(E) cpufreq_powersave(E) cpufreq_conservative(E) arc4(E) ecb(E) md4(E) nfsv3(E) nfs_acl(E) nfs(E) sha512_ssse3(E) sha512_generic(E) lockd(E) cmac(E) grace(E) hmac(E) nls_utf8(E) cifs(E) ccm(E) dns_resolver(E) fscache(E) vmw_vsock_vmci_transport(E) vsock(E) sb_edac(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) vmw_balloon(E) aes_x86_64(E) joydev(E) evdev(E) crypto_simd(E) vmwgfx(E) serio_raw(E) cryptd(E) glue_helper(E) ttm(E) sg(E) vmw_vmci(E) drm_kms_helper(E) drm(E) button(E) ac(E) auth_rpcgss(E) sunrpc(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) ata_generic(E) sd_mod(E) ata_piix(E) crc32c_intel(E) psmouse(E) libata(E) vmw_pvscsi(E) vmxnet3(E) scsi_mod(E) i2c_piix4(E)
[250515.754336] 00000030: 49d4fd21 858665a2 fde5288f 01d2d919  !..I.e...(......
[250515.754766] CR2: ffff8ec52a6fe1d0
[250515.758927] CIFS VFS: No task to wake, unknown frame received! NumMids 3
[250515.759389] ---[ end trace 92ea62cd080150de ]---
[250515.759879] 00000000: 424d53fe 00010040 00000000 00030006  .SMB@...........
[250515.760357] RIP: 0010:0xffff8ec52a6fe1d0
[250515.761841] Code: ff ff 00 00 00 00 fd 01 00 00 00 7d 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> e1 6f 2a c5 8e ff ff d0 e1 6f 2a c5 8e ff ff 80 70 82 2b c5 8e
[250515.763287] RSP: 0018:ffff9e60c2e2bdc0 EFLAGS: 00010202
[250515.763778] RAX: ffff8ec52a6fe1d0 RBX: ffff8ec526fce800 RCX: dead000000000200
[250515.764060] 00000010: 0000000d 00000000 00000db4 00000000  ................
[250515.764310] RDX: ffff8ec52d3e6800 RSI: 0000000000000246 RDI: ffff8ec52a6fe198
[250515.764311] RBP: ffff8ec52d3e6800 R08: 0000000000000002 R09: ffff8ec52a6fe170
[250515.764312] R10: ffff9e60c2e2bc10 R11: ffff8ec527088000 R12: ffff8ec526fce9c0
[250515.764314] R13: ffff8ec52d3e6800 R14: ffff9e60c2e2bdd8 R15: ffff8ec52d3e6b80
[250515.765491] 00000020: 0000025c 00000001 640002e9 0000c81d  \..........d....
[250515.765836] FS:  0000000000000000(0000) GS:ffff8ec52fa80000(0000) knlGS:0000000000000000
[250515.767899] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[250515.768367] CR2: ffff8ec52a6fe1d0 CR3: 000000042a53c005 CR4: 00000000003606e0
[250515.768919] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[250515.769419] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[250515.769591] 00000030: 879543e1 572c0574 e53861ba 97ccc40b  .C..t.,W.a8.....
[271909.743603] CIFS VFS: Send error in SessSetup = -126
[358301.908585] CIFS VFS: Send error in SessSetup = -126
[444693.566453] CIFS VFS: Send error in SessSetup = -126
[531086.090040] CIFS VFS: Send error in SessSetup = -126
[617477.785390] CIFS VFS: Send error in SessSetup = -126
[703869.556705] CIFS VFS: Send error in SessSetup = -126
[790261.656853] CIFS VFS: Send error in SessSetup = -126
[876653.496928] CIFS VFS: Send error in SessSetup = -126
[963045.816742] CIFS VFS: Send error in SessSetup = -126
[1049437.566219] CIFS VFS: Send error in SessSetup = -126


And from another server:
[    4.253091] FS-Cache: Duplicate cookie detected
[    4.253120] FS-Cache: O-cookie c=000000004cc29d26 [p=0000000017dbbbc0 fl=222 nc=0 na=1]
[    4.253153] FS-Cache: O-cookie d=00000000e8e68765 n=0000000012869fa7
[    4.253179] FS-Cache: O-key=[8] '020001bd0a010102'
[    4.253201] FS-Cache: N-cookie c=00000000c3bbbddd [p=0000000017dbbbc0 fl=2 nc=0 na=1]
[    4.253235] FS-Cache: N-cookie d=00000000e8e68765 n=00000000335882b3
[    4.253262] FS-Cache: N-key=[8] '020001bd0a010102'
[    4.254107] FS-Cache: Duplicate cookie detected
[    4.254130] FS-Cache: O-cookie c=000000004cc29d26 [p=0000000017dbbbc0 fl=222 nc=0 na=1]
[    4.254161] FS-Cache: O-cookie d=00000000e8e68765 n=0000000012869fa7
[    4.254185] FS-Cache: O-key=[8] '020001bd0a010102'
[    4.254230] FS-Cache: N-cookie c=000000000ec2f0bb [p=0000000017dbbbc0 fl=2 nc=0 na=1]
[    4.254261] FS-Cache: N-cookie d=00000000e8e68765 n=0000000024706210
[    4.254285] FS-Cache: N-key=[8] '020001bd0a010102'
[    4.329147] CIFS VFS: BAD_NETWORK_NAME: \\stor02.bk.prodrive.nl\userdata$
[    4.330107] CIFS VFS: cifs_mount failed w/return code = -2
[65206.127542] CIFS VFS: Send error in SessSetup = -126
[151597.808064] CIFS VFS: Send error in SessSetup = -126
[237989.956447] CIFS VFS: Send error in SessSetup = -126
[324380.937984] CIFS VFS: Send error in SessSetup = -126
[402750.869518] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[402750.869594] BUG: unable to handle kernel paging request at ffff909bf61609d0
[402750.869650] PGD 1ac02067 P4D 1ac02067 PUD 1385f5063 PMD 136142063 PTE 8000000136160063
[402750.869716] Oops: 0011 [#1] PREEMPT SMP PTI
[402750.869753] CPU: 0 PID: 797 Comm: cifsd Tainted: G            E     4.20.17-pd-4.20.y #20190611
[402750.869818] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/19/2018
[402750.869926] CIFS VFS: No task to wake, unknown frame received! NumMids 3
[402750.869947] RIP: 0010:0xffff909bf61609d0
[402750.870013] 00000000: 424d53fe 00010040 00000000 00000005  .SMB@...........
[402750.870198] Code: ff ff 00 00 00 00 4e 01 00 00 00 7d 00 00 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> 09 16 f6 9b 90 ff ff d0 09 16 f6 9b 90 ff ff c0 40 1f f2 9b 90
[402750.870275] 00000010: 10000009 00000098 000002b5 00000000  ................
[402750.870450] RSP: 0000:ffffa18d41adbd28 EFLAGS: 00010202
[402750.870469] 00000020: 000001cd 00000001 44000cd9 0000c824  ...........D$...
[402750.870509] RAX: ffff909bf61609d0 RBX: ffff909bf6c3f800 RCX: dead000000000200
[402750.870511] RDX: ffff909bf74fa980 RSI: 0000000000000246 RDI: ffff909bf6160998
[402750.870530] 00000030: 79955e4a 0dacb8eb 025edebd 4efa7788  J^.y......^..w.N
[402750.870582] RBP: ffff909bf74fa980 R08: 0000000000000000 R09: ffff909bf6c3f970
[402750.870585] R10: ffffa18d406a7cc0 R11: 0000000000000000 R12: ffff909bf6c3f9c0
[402750.870605] CIFS VFS: No task to wake, unknown frame received! NumMids 3
[402750.870656] R13: ffff909bf74fa980 R14: ffffa18d41adbd40 R15: ffff909bf74fa100
[402750.870660] FS:  0000000000000000(0000) GS:ffff909bf9c00000(0000) knlGS:0000000000000000
[402750.870677] 00000000: 424d53fe 00010040 00000000 00000010  .SMB@...........
[402750.870731] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[402750.870733] CR2: ffff909bf61609d0 CR3: 0000000136e56004 CR4: 00000000003606f0
[402750.870752] 00000010: 0000000d 00000068 000002b6 00000000  ....h...........
[402750.870888] 00000020: 000001cd 00000001 44000cd9 0000c824  ...........D$...
[402750.870894] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[402750.870907] 00000030: f063942a cae9f6e1 d5327c26 91fc6f33  *.c.....&|2.3o..
[402750.872321] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[402750.872324] Call Trace:
[402750.872381]  ? cifs_reconnect+0x337/0x880 [cifs]
[402750.872410]  ? cifs_readv_from_socket+0x211/0x260 [cifs]
[402750.875687]  ? cifs_read_from_socket+0x4a/0x70 [cifs]
[402750.876128]  ? _raw_spin_unlock_irqrestore+0x20/0x40
[402750.876555]  ? try_to_wake_up+0x54/0x540
[402750.876991]  ? cifs_small_buf_get+0x16/0x20 [cifs]
[402750.877426]  ? cifs_demultiplex_thread+0xdd/0xbc0 [cifs]
[402750.877834]  ? finish_task_switch+0x7d/0x290
[402750.878252]  ? cifs_handle_standard+0x190/0x190 [cifs]
[402750.878651]  ? kthread+0xf8/0x130
[402750.879034]  ? kthread_create_worker_on_cpu+0x70/0x70
[402750.879412]  ? ret_from_fork+0x35/0x40
[402750.879786] Modules linked in: cpufreq_conservative(E) cpufreq_powersave(E) cpufreq_userspace(E) arc4(E) ecb(E) md4(E) sha512_ssse3(E) sha512_generic(E) cmac(E) hmac(E) nfsv3(E) nfs_acl(E) nls_utf8(E) nfs(E) lockd(E) grace(E) cifs(E) ccm(E) dns_resolver(E) fscache(E) sb_edac(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) aes_x86_64(E) crypto_simd(E) joydev(E) cryptd(E) vmw_balloon(E) evdev(E) glue_helper(E) serio_raw(E) vmwgfx(E) ttm(E) drm_kms_helper(E) sg(E) drm(E) vmw_vsock_vmci_transport(E) vsock(E) vmw_vmci(E) ac(E) button(E) auth_rpcgss(E) sunrpc(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sd_mod(E) ata_generic(E) ata_piix(E) vmw_pvscsi(E) libata(E) crc32c_intel(E) psmouse(E) vmxnet3(E) i2c_piix4(E) scsi_mod(E)
[402750.883384] CR2: ffff909bf61609d0
[402750.883783] ---[ end trace 08b06875e82513eb ]---
[402750.884200] RIP: 0010:0xffff909bf61609d0
[402750.884598] Code: ff ff 00 00 00 00 52 01 00 00 00 7d 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> 09 16 f6 9b 90 ff ff d0 09 16 f6 9b 90 ff ff c0 40 1f f2 9b 90
[402750.885822] RSP: 0000:ffffa18d41adbd28 EFLAGS: 00010202
[402750.886229] RAX: ffff909bf61609d0 RBX: ffff909bf6c3f800 RCX: dead000000000200
[402750.886651] RDX: ffff909bf74fa980 RSI: 0000000000000246 RDI: ffff909bf6160998
[402750.887065] RBP: ffff909bf74fa980 R08: 0000000000000000 R09: ffff909bf6c3f970
[402750.887478] R10: ffffa18d406a7cc0 R11: 0000000000000000 R12: ffff909bf6c3f9c0
[402750.887883] R13: ffff909bf74fa980 R14: ffffa18d41adbd40 R15: ffff909bf74fa100
[402750.888285] FS:  0000000000000000(0000) GS:ffff909bf9c00000(0000) knlGS:0000000000000000
[402750.888692] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[402750.889093] CR2: ffff909bf61609d0 CR3: 0000000136e56004 CR4: 00000000003606f0
[402750.889559] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[402750.889972] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[410772.901264] CIFS VFS: Send error in SessSetup = -126
[497165.194662] CIFS VFS: Send error in SessSetup = -126
[583557.492304] CIFS VFS: Send error in SessSetup = -126
[669948.937016] CIFS VFS: Send error in SessSetup = -126
[756341.390112] CIFS VFS: Send error in SessSetup = -126
[842733.557002] CIFS VFS: Send error in SessSetup = -126
[929125.303428] CIFS VFS: Send error in SessSetup = -126
[1015516.629380] CIFS VFS: Send error in SessSetup = -126
[1101908.464697] CIFS VFS: Send error in SessSetup = -126
[1188300.531261] CIFS VFS: Send error in SessSetup = -126
[1274692.583049] CIFS VFS: Send error in SessSetup = -126


> 
> The individual stack dumps are pretty useful. Here is my theory:
> 
>> pid: 9505
>> syscall: 4 0x56550a2ec470 0x7ffede42e9a0 0x7ffede42e9a0 0x83a 0x3 0x20
>> 0x7ffede42e8f8 0x7f7f8928f295
>> [<0>] open_shroot+0x43/0x200 [cifs]
>> [<0>] smb2_query_path_info+0x93/0x220 [cifs]
> 
> Almost all of the processes have the same stack trace. They are stuck at
> open_shroot()+0x43 which is probably
> 
>      mutex_lock(&tcon->crfid.fid_mutex);
> 
> Then there are only 2 other processes stuck somewhere in the same code path
> (open_shroot) but deeper, meaning they have the locks that the other
> processes are waiting for:
> 
> 
>> pid: 22858
>> syscall: 4 0x564b46285d10 0x7ffcea3f9a80 0x7ffcea3f9a80 0x83a 0x3 0x20
>> 0x7ffcea3f99d8 0x7f6cc78c7295
>> [<0>] cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
>> [<0>] smb2_reconnect+0x2d6/0x4b0 [cifs]
>> [<0>] smb2_plain_req_init+0x30/0x240 [cifs]
>> [<0>] SMB2_open_init+0x6d/0x7c0 [cifs]
>> [<0>] SMB2_open+0x150/0x520 [cifs]
>> [<0>] open_shroot+0x12f/0x200 [cifs]
>> [<0>] smb2_query_path_info+0x93/0x220 [cifs]
>> [<0>] cifs_get_inode_info+0x580/0xb10 [cifs]
>> [<0>] cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
>> [<0>] cifs_getattr+0x5b/0x1b0 [cifs]
>> [<0>] vfs_statx+0x89/0xe0
>> [<0>] __do_sys_newstat+0x39/0x70
>> [<0>] do_syscall_64+0x55/0x100
>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [<0>] 0xffffffffffffffff
> 
> 
>> pid: 20027
>> syscall: 4 0x55a3c7d767d0 0x7ffe51432ab0 0x7ffe51432ab0 0x83a
>> 0x55a3c7d75c40 0x20 0x7ffe51432a08 0x7f5f7c4e7295
>> [<0>] cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
>> [<0>] smb2_reconnect+0x2d6/0x4b0 [cifs]
>> [<0>] smb2_plain_req_init+0x30/0x240 [cifs]
>> [<0>] SMB2_open_init+0x6d/0x7c0 [cifs]
>> [<0>] SMB2_open+0x150/0x520 [cifs]
>> [<0>] open_shroot+0x12f/0x200 [cifs]
>> [<0>] smb2_query_path_info+0x93/0x220 [cifs]
>> [<0>] cifs_get_inode_info+0x580/0xb10 [cifs]
>> [<0>] cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
>> [<0>] cifs_getattr+0x5b/0x1b0 [cifs]
>> [<0>] vfs_statx+0x89/0xe0
>> [<0>] __do_sys_newstat+0x39/0x70
>> [<0>] do_syscall_64+0x55/0x100
>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [<0>] 0xffffffffffffffff
> 
> Due to timeouts maybe the Open request needs
> to reconnect the server/ses/tcon and to do this it calls
> cifs_mark_open_files_invalid() and gets stuck somewhere there.
> 
>          spin_lock(&tcon->open_file_lock);
>          list_for_each_safe(tmp, tmp1, &tcon->openFileList) {
>                  open_file = list_entry(tmp, struct cifsFileInfo, tlist);
>                  open_file->invalidHandle = true;
>                  open_file->oplock_break_cancelled = true;
>          }
>          spin_unlock(&tcon->open_file_lock);
> 
>          mutex_lock(&tcon->crfid.fid_mutex); <=== most likely here
>          tcon->crfid.is_valid = false;
>          memset(tcon->crfid.fid, 0, sizeof(struct cifs_fid));
>          mutex_unlock(&tcon->crfid.fid_mutex);
> 
> I think these processes are trying to lock the same lock twice: one in
> open_shroot() and since Open ends up having to reconnect, once again in
> mark_open_files_invalid(). I think it's the same lock because I don't
> see why the tcon pointers would be different in those 2 spots. Kernel
> mutexes are not reentrant so this is a deadlock.

Is there anything we can do about this? Is this maybe already fixed in newer kernels?

Regards, Martijn

> 
> Cheers,
> 

-- 
Martijn de Gouw
Designer
Prodrive Technologies
Mobile: +31 63 17 76 161
Phone:  +31 40 26 76 200 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Many processes end up in uninterruptible sleep accessing cifs mounts
  2019-07-04 11:22     ` Aurélien Aptel
  2019-07-04 13:36       ` Martijn de Gouw
@ 2019-07-04 21:08       ` Pavel Shilovsky
  1 sibling, 0 replies; 16+ messages in thread
From: Pavel Shilovsky @ 2019-07-04 21:08 UTC (permalink / raw)
  To: Aurélien Aptel, ronnie sahlberg, Steve French
  Cc: Martijn de Gouw, linux-cifs

+ Ronnie, Steve

Good analysis, thanks!

One way to fix it is to drop the mutex before entering any PDU
functions, then acquire it again and double check if any concurrent
process has already initialize the handle. The complication here is
that the current kernel diverged from 4.20.y stable kernel in this
code path, so this would require separate patches for the current and
stable kernels.

--
Best regards,
Pavel Shilovsky

чт, 4 июл. 2019 г. в 04:24, Aurélien Aptel <aaptel@suse.com>:
>
> Martijn de Gouw <martijn.de.gouw@prodrive-technologies.com> writes:
> >> Are there any kernel oops/panic with stack traces and register dumps in
> >> the log?
> >>
> >> You can inspect the kernel stack trace of the hung processes (to see where
> >> they are stuck) by printing the file /proc/<pid>/stack.
> >
> > These are the stacks of all processes that are D, most of them being df.
> > I also attached the cifs Stats output below.
>
> Ok thanks. What about Oops or BUG or panic in dmesg logs, did you see
> any?
>
> The individual stack dumps are pretty useful. Here is my theory:
>
> > pid: 9505
> > syscall: 4 0x56550a2ec470 0x7ffede42e9a0 0x7ffede42e9a0 0x83a 0x3 0x20
> > 0x7ffede42e8f8 0x7f7f8928f295
> > [<0>] open_shroot+0x43/0x200 [cifs]
> > [<0>] smb2_query_path_info+0x93/0x220 [cifs]
>
> Almost all of the processes have the same stack trace. They are stuck at
> open_shroot()+0x43 which is probably
>
>     mutex_lock(&tcon->crfid.fid_mutex);
>
> Then there are only 2 other processes stuck somewhere in the same code path
> (open_shroot) but deeper, meaning they have the locks that the other
> processes are waiting for:
>
>
> > pid: 22858
> > syscall: 4 0x564b46285d10 0x7ffcea3f9a80 0x7ffcea3f9a80 0x83a 0x3 0x20
> > 0x7ffcea3f99d8 0x7f6cc78c7295
> > [<0>] cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
> > [<0>] smb2_reconnect+0x2d6/0x4b0 [cifs]
> > [<0>] smb2_plain_req_init+0x30/0x240 [cifs]
> > [<0>] SMB2_open_init+0x6d/0x7c0 [cifs]
> > [<0>] SMB2_open+0x150/0x520 [cifs]
> > [<0>] open_shroot+0x12f/0x200 [cifs]
> > [<0>] smb2_query_path_info+0x93/0x220 [cifs]
> > [<0>] cifs_get_inode_info+0x580/0xb10 [cifs]
> > [<0>] cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
> > [<0>] cifs_getattr+0x5b/0x1b0 [cifs]
> > [<0>] vfs_statx+0x89/0xe0
> > [<0>] __do_sys_newstat+0x39/0x70
> > [<0>] do_syscall_64+0x55/0x100
> > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [<0>] 0xffffffffffffffff
>
>
> > pid: 20027
> > syscall: 4 0x55a3c7d767d0 0x7ffe51432ab0 0x7ffe51432ab0 0x83a
> > 0x55a3c7d75c40 0x20 0x7ffe51432a08 0x7f5f7c4e7295
> > [<0>] cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
> > [<0>] smb2_reconnect+0x2d6/0x4b0 [cifs]
> > [<0>] smb2_plain_req_init+0x30/0x240 [cifs]
> > [<0>] SMB2_open_init+0x6d/0x7c0 [cifs]
> > [<0>] SMB2_open+0x150/0x520 [cifs]
> > [<0>] open_shroot+0x12f/0x200 [cifs]
> > [<0>] smb2_query_path_info+0x93/0x220 [cifs]
> > [<0>] cifs_get_inode_info+0x580/0xb10 [cifs]
> > [<0>] cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
> > [<0>] cifs_getattr+0x5b/0x1b0 [cifs]
> > [<0>] vfs_statx+0x89/0xe0
> > [<0>] __do_sys_newstat+0x39/0x70
> > [<0>] do_syscall_64+0x55/0x100
> > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [<0>] 0xffffffffffffffff
>
> Due to timeouts maybe the Open request needs
> to reconnect the server/ses/tcon and to do this it calls
> cifs_mark_open_files_invalid() and gets stuck somewhere there.
>
>         spin_lock(&tcon->open_file_lock);
>         list_for_each_safe(tmp, tmp1, &tcon->openFileList) {
>                 open_file = list_entry(tmp, struct cifsFileInfo, tlist);
>                 open_file->invalidHandle = true;
>                 open_file->oplock_break_cancelled = true;
>         }
>         spin_unlock(&tcon->open_file_lock);
>
>         mutex_lock(&tcon->crfid.fid_mutex); <=== most likely here
>         tcon->crfid.is_valid = false;
>         memset(tcon->crfid.fid, 0, sizeof(struct cifs_fid));
>         mutex_unlock(&tcon->crfid.fid_mutex);
>
> I think these processes are trying to lock the same lock twice: one in
> open_shroot() and since Open ends up having to reconnect, once again in
> mark_open_files_invalid(). I think it's the same lock because I don't
> see why the tcon pointers would be different in those 2 spots. Kernel
> mutexes are not reentrant so this is a deadlock.
>
> Cheers,
> --
> Aurélien Aptel / SUSE Labs Samba Team
> GPG: 1839 CB5F 9F5B FB9B AA97  8C99 03C8 A49B 521B D5D3
> SUSE Linux GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
> GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Many processes end up in uninterruptible sleep accessing cifs mounts
  2019-07-04 13:36       ` Martijn de Gouw
@ 2019-07-04 21:34         ` Pavel Shilovsky
  2019-07-06 11:11           ` Martijn de Gouw
  0 siblings, 1 reply; 16+ messages in thread
From: Pavel Shilovsky @ 2019-07-04 21:34 UTC (permalink / raw)
  To: Martijn de Gouw; +Cc: Aurélien Aptel, linux-cifs

Hi Martijn,

Thanks for reporting the problem. Have you tried v5.1.5 kernel and
above? That one has many reconnect fixes. If yes, could you please
provide the kernel stack / panic traces when running new versions of
the kernel?

--
Best regards,
Pavel Shilovsky

чт, 4 июл. 2019 г. в 06:36, Martijn de Gouw
<martijn.de.gouw@prodrive-technologies.com>:
>
> Hi,
>
> On 04-07-2019 13:22, Aurélien Aptel wrote:
> > Martijn de Gouw <martijn.de.gouw@prodrive-technologies.com> writes:
> >>> Are there any kernel oops/panic with stack traces and register dumps in
> >>> the log?
> >>>
> >>> You can inspect the kernel stack trace of the hung processes (to see where
> >>> they are stuck) by printing the file /proc/<pid>/stack.
> >>
> >> These are the stacks of all processes that are D, most of them being df.
> >> I also attached the cifs Stats output below.
> >
> > Ok thanks. What about Oops or BUG or panic in dmesg logs, did you see
> > any?
>
> I did find the following messages in the dmesg of one of the servers:
>
> [    4.797893] FS-Cache: Duplicate cookie detected
> [    4.797915] FS-Cache: O-cookie c=000000001a791554 [p=00000000fb6f31ee fl=222 nc=0 na=1]
> [    4.797934] FS-Cache: O-cookie d=0000000020a06fab n=00000000654600e7
> [    4.797949] FS-Cache: O-key=[8] '020001bd0a010102'
> [    4.797963] FS-Cache: N-cookie c=000000005d0bf4a5 [p=00000000fb6f31ee fl=2 nc=0 na=1]
> [    4.797982] FS-Cache: N-cookie d=0000000020a06fab n=000000004e1e47aa
> [    4.797997] FS-Cache: N-key=[8] '020001bd0a010102'
> [    4.798643] FS-Cache: Duplicate cookie detected
> [    4.798659] FS-Cache: O-cookie c=000000001a791554 [p=00000000fb6f31ee fl=222 nc=0 na=1]
> [    4.798679] FS-Cache: O-cookie d=0000000020a06fab n=00000000654600e7
> [    4.798695] FS-Cache: O-key=[8] '020001bd0a010102'
> [    4.798714] FS-Cache: N-cookie c=00000000cbe44971 [p=00000000fb6f31ee fl=2 nc=0 na=1]
> [    4.798733] FS-Cache: N-cookie d=0000000020a06fab n=00000000ab0e78a6
> [    4.798747] FS-Cache: N-key=[8] '020001bd0a010102'
> [    4.906667] FS-Cache: Netfs 'nfs' registered for caching
> [12738.729173] CIFS VFS: Send error in SessSetup = -126
> [99125.480751] CIFS VFS: Send error in SessSetup = -126
> [185517.295175] CIFS VFS: Send error in SessSetup = -126
> [250515.749714] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
> [250515.749740] BUG: unable to handle kernel paging request at ffff8ec52a6fe1d0
> [250515.749757] PGD 1b2602067 P4D 1b2602067 PUD 42dbff063 PMD 42a357063 PTE 800000042a6fe063
> [250515.749779] Oops: 0011 [#1] PREEMPT SMP PTI
> [250515.749792] CPU: 1 PID: 796 Comm: cifsd Tainted: G            E     4.20.17-pd-4.20.y #20190611
> [250515.749812] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/19/2018
> [250515.749844] RIP: 0010:0xffff8ec52a6fe1d0
> [250515.749860] Code: ff ff 00 00 00 00 fd 01 00 00 00 7d 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> e1 6f 2a c5 8e ff ff d0 e1 6f 2a c5 8e ff ff 80 70 82 2b c5 8e
> [250515.749914] RSP: 0018:ffff9e60c2e2bdc0 EFLAGS: 00010202
> [250515.749927] RAX: ffff8ec52a6fe1d0 RBX: ffff8ec526fce800 RCX: dead000000000200
> [250515.749944] RDX: ffff8ec52d3e6800 RSI: 0000000000000246 RDI: ffff8ec52a6fe198
> [250515.749962] RBP: ffff8ec52d3e6800 R08: 0000000000000002 R09: ffff8ec52a6fe170
> [250515.749979] R10: ffff9e60c2e2bc10 R11: ffff8ec527088000 R12: ffff8ec526fce9c0
> [250515.749997] R13: ffff8ec52d3e6800 R14: ffff9e60c2e2bdd8 R15: ffff8ec52d3e6b80
> [250515.750014] FS:  0000000000000000(0000) GS:ffff8ec52fa80000(0000) knlGS:0000000000000000
> [250515.750033] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [250515.750048] CR2: ffff8ec52a6fe1d0 CR3: 000000042a53c005 CR4: 00000000003606e0
> [250515.750100] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [250515.750102] CIFS VFS: No task to wake, unknown frame received! NumMids 3
> [250515.750119] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [250515.750120] Call Trace:
> [250515.750149] 00000000: 424d53fe 00010040 00000000 00000005  .SMB@...........
> [250515.750193]  ? cifs_reconnect+0x337/0x880 [cifs]
> [250515.750201]  ? cifs_handle_standard+0x169/0x190 [cifs]
> [250515.750216] 00000010: 10000009 00000098 00000db2 00000000  ................
> [250515.750234]  ? cifs_demultiplex_thread+0x9e5/0xbc0 [cifs]
> [250515.750240] 00000020: 0000025c 00000001 640002e9 0000c81d  \..........d....
> [250515.750259]  ? finish_task_switch+0x7d/0x290
> [250515.750271] 00000030: 5a9e63f3 6204a9b0 587058e3 0d45419b  .c.Z...b.XpX.AE.
> [250515.750295]  ? cifs_handle_standard+0x190/0x190 [cifs]
> [250515.750728] CIFS VFS: No task to wake, unknown frame received! NumMids 3
> [250515.751200]  ? kthread+0xf8/0x130
> [250515.751639] 00000000: 424d53fe 00010040 00000000 00000010  .SMB@...........
> [250515.752102]  ? kthread_create_worker_on_cpu+0x70/0x70
> [250515.752554] 00000010: 0000000d 00000068 00000db3 00000000  ....h...........
> [250515.752997]  ? ret_from_fork+0x35/0x40
> [250515.753429] 00000020: 0000025c 00000001 640002e9 0000c81d  \..........d....
> [250515.753874] Modules linked in: cpufreq_userspace(E) cpufreq_powersave(E) cpufreq_conservative(E) arc4(E) ecb(E) md4(E) nfsv3(E) nfs_acl(E) nfs(E) sha512_ssse3(E) sha512_generic(E) lockd(E) cmac(E) grace(E) hmac(E) nls_utf8(E) cifs(E) ccm(E) dns_resolver(E) fscache(E) vmw_vsock_vmci_transport(E) vsock(E) sb_edac(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) vmw_balloon(E) aes_x86_64(E) joydev(E) evdev(E) crypto_simd(E) vmwgfx(E) serio_raw(E) cryptd(E) glue_helper(E) ttm(E) sg(E) vmw_vmci(E) drm_kms_helper(E) drm(E) button(E) ac(E) auth_rpcgss(E) sunrpc(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) ata_generic(E) sd_mod(E) ata_piix(E) crc32c_intel(E) psmouse(E) libata(E) vmw_pvscsi(E) vmxnet3(E) scsi_mod(E) i2c_piix4(E)
> [250515.754336] 00000030: 49d4fd21 858665a2 fde5288f 01d2d919  !..I.e...(......
> [250515.754766] CR2: ffff8ec52a6fe1d0
> [250515.758927] CIFS VFS: No task to wake, unknown frame received! NumMids 3
> [250515.759389] ---[ end trace 92ea62cd080150de ]---
> [250515.759879] 00000000: 424d53fe 00010040 00000000 00030006  .SMB@...........
> [250515.760357] RIP: 0010:0xffff8ec52a6fe1d0
> [250515.761841] Code: ff ff 00 00 00 00 fd 01 00 00 00 7d 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> e1 6f 2a c5 8e ff ff d0 e1 6f 2a c5 8e ff ff 80 70 82 2b c5 8e
> [250515.763287] RSP: 0018:ffff9e60c2e2bdc0 EFLAGS: 00010202
> [250515.763778] RAX: ffff8ec52a6fe1d0 RBX: ffff8ec526fce800 RCX: dead000000000200
> [250515.764060] 00000010: 0000000d 00000000 00000db4 00000000  ................
> [250515.764310] RDX: ffff8ec52d3e6800 RSI: 0000000000000246 RDI: ffff8ec52a6fe198
> [250515.764311] RBP: ffff8ec52d3e6800 R08: 0000000000000002 R09: ffff8ec52a6fe170
> [250515.764312] R10: ffff9e60c2e2bc10 R11: ffff8ec527088000 R12: ffff8ec526fce9c0
> [250515.764314] R13: ffff8ec52d3e6800 R14: ffff9e60c2e2bdd8 R15: ffff8ec52d3e6b80
> [250515.765491] 00000020: 0000025c 00000001 640002e9 0000c81d  \..........d....
> [250515.765836] FS:  0000000000000000(0000) GS:ffff8ec52fa80000(0000) knlGS:0000000000000000
> [250515.767899] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [250515.768367] CR2: ffff8ec52a6fe1d0 CR3: 000000042a53c005 CR4: 00000000003606e0
> [250515.768919] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [250515.769419] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [250515.769591] 00000030: 879543e1 572c0574 e53861ba 97ccc40b  .C..t.,W.a8.....
> [271909.743603] CIFS VFS: Send error in SessSetup = -126
> [358301.908585] CIFS VFS: Send error in SessSetup = -126
> [444693.566453] CIFS VFS: Send error in SessSetup = -126
> [531086.090040] CIFS VFS: Send error in SessSetup = -126
> [617477.785390] CIFS VFS: Send error in SessSetup = -126
> [703869.556705] CIFS VFS: Send error in SessSetup = -126
> [790261.656853] CIFS VFS: Send error in SessSetup = -126
> [876653.496928] CIFS VFS: Send error in SessSetup = -126
> [963045.816742] CIFS VFS: Send error in SessSetup = -126
> [1049437.566219] CIFS VFS: Send error in SessSetup = -126
>
>
> And from another server:
> [    4.253091] FS-Cache: Duplicate cookie detected
> [    4.253120] FS-Cache: O-cookie c=000000004cc29d26 [p=0000000017dbbbc0 fl=222 nc=0 na=1]
> [    4.253153] FS-Cache: O-cookie d=00000000e8e68765 n=0000000012869fa7
> [    4.253179] FS-Cache: O-key=[8] '020001bd0a010102'
> [    4.253201] FS-Cache: N-cookie c=00000000c3bbbddd [p=0000000017dbbbc0 fl=2 nc=0 na=1]
> [    4.253235] FS-Cache: N-cookie d=00000000e8e68765 n=00000000335882b3
> [    4.253262] FS-Cache: N-key=[8] '020001bd0a010102'
> [    4.254107] FS-Cache: Duplicate cookie detected
> [    4.254130] FS-Cache: O-cookie c=000000004cc29d26 [p=0000000017dbbbc0 fl=222 nc=0 na=1]
> [    4.254161] FS-Cache: O-cookie d=00000000e8e68765 n=0000000012869fa7
> [    4.254185] FS-Cache: O-key=[8] '020001bd0a010102'
> [    4.254230] FS-Cache: N-cookie c=000000000ec2f0bb [p=0000000017dbbbc0 fl=2 nc=0 na=1]
> [    4.254261] FS-Cache: N-cookie d=00000000e8e68765 n=0000000024706210
> [    4.254285] FS-Cache: N-key=[8] '020001bd0a010102'
> [    4.329147] CIFS VFS: BAD_NETWORK_NAME: \\stor02.bk.prodrive.nl\userdata$
> [    4.330107] CIFS VFS: cifs_mount failed w/return code = -2
> [65206.127542] CIFS VFS: Send error in SessSetup = -126
> [151597.808064] CIFS VFS: Send error in SessSetup = -126
> [237989.956447] CIFS VFS: Send error in SessSetup = -126
> [324380.937984] CIFS VFS: Send error in SessSetup = -126
> [402750.869518] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
> [402750.869594] BUG: unable to handle kernel paging request at ffff909bf61609d0
> [402750.869650] PGD 1ac02067 P4D 1ac02067 PUD 1385f5063 PMD 136142063 PTE 8000000136160063
> [402750.869716] Oops: 0011 [#1] PREEMPT SMP PTI
> [402750.869753] CPU: 0 PID: 797 Comm: cifsd Tainted: G            E     4.20.17-pd-4.20.y #20190611
> [402750.869818] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/19/2018
> [402750.869926] CIFS VFS: No task to wake, unknown frame received! NumMids 3
> [402750.869947] RIP: 0010:0xffff909bf61609d0
> [402750.870013] 00000000: 424d53fe 00010040 00000000 00000005  .SMB@...........
> [402750.870198] Code: ff ff 00 00 00 00 4e 01 00 00 00 7d 00 00 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> 09 16 f6 9b 90 ff ff d0 09 16 f6 9b 90 ff ff c0 40 1f f2 9b 90
> [402750.870275] 00000010: 10000009 00000098 000002b5 00000000  ................
> [402750.870450] RSP: 0000:ffffa18d41adbd28 EFLAGS: 00010202
> [402750.870469] 00000020: 000001cd 00000001 44000cd9 0000c824  ...........D$...
> [402750.870509] RAX: ffff909bf61609d0 RBX: ffff909bf6c3f800 RCX: dead000000000200
> [402750.870511] RDX: ffff909bf74fa980 RSI: 0000000000000246 RDI: ffff909bf6160998
> [402750.870530] 00000030: 79955e4a 0dacb8eb 025edebd 4efa7788  J^.y......^..w.N
> [402750.870582] RBP: ffff909bf74fa980 R08: 0000000000000000 R09: ffff909bf6c3f970
> [402750.870585] R10: ffffa18d406a7cc0 R11: 0000000000000000 R12: ffff909bf6c3f9c0
> [402750.870605] CIFS VFS: No task to wake, unknown frame received! NumMids 3
> [402750.870656] R13: ffff909bf74fa980 R14: ffffa18d41adbd40 R15: ffff909bf74fa100
> [402750.870660] FS:  0000000000000000(0000) GS:ffff909bf9c00000(0000) knlGS:0000000000000000
> [402750.870677] 00000000: 424d53fe 00010040 00000000 00000010  .SMB@...........
> [402750.870731] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [402750.870733] CR2: ffff909bf61609d0 CR3: 0000000136e56004 CR4: 00000000003606f0
> [402750.870752] 00000010: 0000000d 00000068 000002b6 00000000  ....h...........
> [402750.870888] 00000020: 000001cd 00000001 44000cd9 0000c824  ...........D$...
> [402750.870894] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [402750.870907] 00000030: f063942a cae9f6e1 d5327c26 91fc6f33  *.c.....&|2.3o..
> [402750.872321] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [402750.872324] Call Trace:
> [402750.872381]  ? cifs_reconnect+0x337/0x880 [cifs]
> [402750.872410]  ? cifs_readv_from_socket+0x211/0x260 [cifs]
> [402750.875687]  ? cifs_read_from_socket+0x4a/0x70 [cifs]
> [402750.876128]  ? _raw_spin_unlock_irqrestore+0x20/0x40
> [402750.876555]  ? try_to_wake_up+0x54/0x540
> [402750.876991]  ? cifs_small_buf_get+0x16/0x20 [cifs]
> [402750.877426]  ? cifs_demultiplex_thread+0xdd/0xbc0 [cifs]
> [402750.877834]  ? finish_task_switch+0x7d/0x290
> [402750.878252]  ? cifs_handle_standard+0x190/0x190 [cifs]
> [402750.878651]  ? kthread+0xf8/0x130
> [402750.879034]  ? kthread_create_worker_on_cpu+0x70/0x70
> [402750.879412]  ? ret_from_fork+0x35/0x40
> [402750.879786] Modules linked in: cpufreq_conservative(E) cpufreq_powersave(E) cpufreq_userspace(E) arc4(E) ecb(E) md4(E) sha512_ssse3(E) sha512_generic(E) cmac(E) hmac(E) nfsv3(E) nfs_acl(E) nls_utf8(E) nfs(E) lockd(E) grace(E) cifs(E) ccm(E) dns_resolver(E) fscache(E) sb_edac(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) aes_x86_64(E) crypto_simd(E) joydev(E) cryptd(E) vmw_balloon(E) evdev(E) glue_helper(E) serio_raw(E) vmwgfx(E) ttm(E) drm_kms_helper(E) sg(E) drm(E) vmw_vsock_vmci_transport(E) vsock(E) vmw_vmci(E) ac(E) button(E) auth_rpcgss(E) sunrpc(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sd_mod(E) ata_generic(E) ata_piix(E) vmw_pvscsi(E) libata(E) crc32c_intel(E) psmouse(E) vmxnet3(E) i2c_piix4(E) scsi_mod(E)
> [402750.883384] CR2: ffff909bf61609d0
> [402750.883783] ---[ end trace 08b06875e82513eb ]---
> [402750.884200] RIP: 0010:0xffff909bf61609d0
> [402750.884598] Code: ff ff 00 00 00 00 52 01 00 00 00 7d 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> 09 16 f6 9b 90 ff ff d0 09 16 f6 9b 90 ff ff c0 40 1f f2 9b 90
> [402750.885822] RSP: 0000:ffffa18d41adbd28 EFLAGS: 00010202
> [402750.886229] RAX: ffff909bf61609d0 RBX: ffff909bf6c3f800 RCX: dead000000000200
> [402750.886651] RDX: ffff909bf74fa980 RSI: 0000000000000246 RDI: ffff909bf6160998
> [402750.887065] RBP: ffff909bf74fa980 R08: 0000000000000000 R09: ffff909bf6c3f970
> [402750.887478] R10: ffffa18d406a7cc0 R11: 0000000000000000 R12: ffff909bf6c3f9c0
> [402750.887883] R13: ffff909bf74fa980 R14: ffffa18d41adbd40 R15: ffff909bf74fa100
> [402750.888285] FS:  0000000000000000(0000) GS:ffff909bf9c00000(0000) knlGS:0000000000000000
> [402750.888692] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [402750.889093] CR2: ffff909bf61609d0 CR3: 0000000136e56004 CR4: 00000000003606f0
> [402750.889559] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [402750.889972] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [410772.901264] CIFS VFS: Send error in SessSetup = -126
> [497165.194662] CIFS VFS: Send error in SessSetup = -126
> [583557.492304] CIFS VFS: Send error in SessSetup = -126
> [669948.937016] CIFS VFS: Send error in SessSetup = -126
> [756341.390112] CIFS VFS: Send error in SessSetup = -126
> [842733.557002] CIFS VFS: Send error in SessSetup = -126
> [929125.303428] CIFS VFS: Send error in SessSetup = -126
> [1015516.629380] CIFS VFS: Send error in SessSetup = -126
> [1101908.464697] CIFS VFS: Send error in SessSetup = -126
> [1188300.531261] CIFS VFS: Send error in SessSetup = -126
> [1274692.583049] CIFS VFS: Send error in SessSetup = -126
>
>
> >
> > The individual stack dumps are pretty useful. Here is my theory:
> >
> >> pid: 9505
> >> syscall: 4 0x56550a2ec470 0x7ffede42e9a0 0x7ffede42e9a0 0x83a 0x3 0x20
> >> 0x7ffede42e8f8 0x7f7f8928f295
> >> [<0>] open_shroot+0x43/0x200 [cifs]
> >> [<0>] smb2_query_path_info+0x93/0x220 [cifs]
> >
> > Almost all of the processes have the same stack trace. They are stuck at
> > open_shroot()+0x43 which is probably
> >
> >      mutex_lock(&tcon->crfid.fid_mutex);
> >
> > Then there are only 2 other processes stuck somewhere in the same code path
> > (open_shroot) but deeper, meaning they have the locks that the other
> > processes are waiting for:
> >
> >
> >> pid: 22858
> >> syscall: 4 0x564b46285d10 0x7ffcea3f9a80 0x7ffcea3f9a80 0x83a 0x3 0x20
> >> 0x7ffcea3f99d8 0x7f6cc78c7295
> >> [<0>] cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
> >> [<0>] smb2_reconnect+0x2d6/0x4b0 [cifs]
> >> [<0>] smb2_plain_req_init+0x30/0x240 [cifs]
> >> [<0>] SMB2_open_init+0x6d/0x7c0 [cifs]
> >> [<0>] SMB2_open+0x150/0x520 [cifs]
> >> [<0>] open_shroot+0x12f/0x200 [cifs]
> >> [<0>] smb2_query_path_info+0x93/0x220 [cifs]
> >> [<0>] cifs_get_inode_info+0x580/0xb10 [cifs]
> >> [<0>] cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
> >> [<0>] cifs_getattr+0x5b/0x1b0 [cifs]
> >> [<0>] vfs_statx+0x89/0xe0
> >> [<0>] __do_sys_newstat+0x39/0x70
> >> [<0>] do_syscall_64+0x55/0x100
> >> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >> [<0>] 0xffffffffffffffff
> >
> >
> >> pid: 20027
> >> syscall: 4 0x55a3c7d767d0 0x7ffe51432ab0 0x7ffe51432ab0 0x83a
> >> 0x55a3c7d75c40 0x20 0x7ffe51432a08 0x7f5f7c4e7295
> >> [<0>] cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
> >> [<0>] smb2_reconnect+0x2d6/0x4b0 [cifs]
> >> [<0>] smb2_plain_req_init+0x30/0x240 [cifs]
> >> [<0>] SMB2_open_init+0x6d/0x7c0 [cifs]
> >> [<0>] SMB2_open+0x150/0x520 [cifs]
> >> [<0>] open_shroot+0x12f/0x200 [cifs]
> >> [<0>] smb2_query_path_info+0x93/0x220 [cifs]
> >> [<0>] cifs_get_inode_info+0x580/0xb10 [cifs]
> >> [<0>] cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
> >> [<0>] cifs_getattr+0x5b/0x1b0 [cifs]
> >> [<0>] vfs_statx+0x89/0xe0
> >> [<0>] __do_sys_newstat+0x39/0x70
> >> [<0>] do_syscall_64+0x55/0x100
> >> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >> [<0>] 0xffffffffffffffff
> >
> > Due to timeouts maybe the Open request needs
> > to reconnect the server/ses/tcon and to do this it calls
> > cifs_mark_open_files_invalid() and gets stuck somewhere there.
> >
> >          spin_lock(&tcon->open_file_lock);
> >          list_for_each_safe(tmp, tmp1, &tcon->openFileList) {
> >                  open_file = list_entry(tmp, struct cifsFileInfo, tlist);
> >                  open_file->invalidHandle = true;
> >                  open_file->oplock_break_cancelled = true;
> >          }
> >          spin_unlock(&tcon->open_file_lock);
> >
> >          mutex_lock(&tcon->crfid.fid_mutex); <=== most likely here
> >          tcon->crfid.is_valid = false;
> >          memset(tcon->crfid.fid, 0, sizeof(struct cifs_fid));
> >          mutex_unlock(&tcon->crfid.fid_mutex);
> >
> > I think these processes are trying to lock the same lock twice: one in
> > open_shroot() and since Open ends up having to reconnect, once again in
> > mark_open_files_invalid(). I think it's the same lock because I don't
> > see why the tcon pointers would be different in those 2 spots. Kernel
> > mutexes are not reentrant so this is a deadlock.
>
> Is there anything we can do about this? Is this maybe already fixed in newer kernels?
>
> Regards, Martijn
>
> >
> > Cheers,
> >
>
> --
> Martijn de Gouw
> Designer
> Prodrive Technologies
> Mobile: +31 63 17 76 161
> Phone:  +31 40 26 76 200

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Many processes end up in uninterruptible sleep accessing cifs mounts
  2019-07-04 21:34         ` Pavel Shilovsky
@ 2019-07-06 11:11           ` Martijn de Gouw
  2019-07-06 13:05             ` Steve French
  0 siblings, 1 reply; 16+ messages in thread
From: Martijn de Gouw @ 2019-07-06 11:11 UTC (permalink / raw)
  To: Pavel Shilovsky; +Cc: Aurélien Aptel, linux-cifs

Hi Pavel,

On 04-07-2019 23:34, Pavel Shilovsky wrote:
> Hi Martijn,
> 
> Thanks for reporting the problem. Have you tried v5.1.5 kernel and
> above? That one has many reconnect fixes. If yes, could you please
> provide the kernel stack / panic traces when running new versions of
> the kernel?

I can try to run them for a while on a less critical server. But even if that machine runs fine, there is not really a guarantee the cifs issue we have right now is solved for our critical machines, causing me to be very hesitated to upgrade those.

Btw, here is another kernel dump, generated today. Please note that on this machine the cifs mounts are not really used, all 'action'
performed on these mounts is the check if they are still mounted (by puppet).


[36247.367572] INFO: task node_exporter:3567 blocked for more than 120 seconds.
[36247.367613]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
[36247.367630] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[36247.367648] node_exporter   D    0  3567      1 0x00000000
[36247.367650] Call Trace:
[36247.367668]  ? __schedule+0x338/0x850
[36247.367671]  ? preempt_count_add+0x67/0xb0
[36247.367673]  schedule+0x3c/0x90
[36247.367675]  schedule_preempt_disabled+0x14/0x20
[36247.367676]  __mutex_lock.isra.7+0x19f/0x540
[36247.367680]  ? try_module_get+0x68/0x100
[36247.367792]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
[36247.367803]  smb2_reconnect+0x1d7/0x4b0 [cifs]
[36247.367824]  smb2_plain_req_init+0x30/0x240 [cifs]
[36247.367826]  ? __switch_to_asm+0x34/0x70
[36247.367827]  ? __switch_to_asm+0x40/0x70
[36247.367836]  SMB2_open_init+0x6d/0x7c0 [cifs]
[36247.367846]  ? smb2_queryfs+0x13d/0x350 [cifs]
[36247.367848]  ? lookup_fast+0xc8/0x2e0
[36247.367857]  smb2_queryfs+0x13d/0x350 [cifs]
[36247.367861]  ? lookup_fast+0xc8/0x2e0
[36247.367863]  ? ___cache_free+0x31/0x2e0
[36247.367865]  ? terminate_walk+0x93/0x100
[36247.367873]  ? cifs_statfs+0xad/0x300 [cifs]
[36247.367880]  cifs_statfs+0xad/0x300 [cifs]
[36247.367894]  statfs_by_dentry+0x6a/0x90
[36247.367895]  vfs_statfs+0x16/0xc0
[36247.367897]  user_statfs+0x50/0xa0
[36247.367898]  __do_sys_statfs+0x20/0x50
[36247.367902]  do_syscall_64+0x55/0x100
[36247.367903]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[36247.367905] RIP: 0033:0x4a5c20
[36247.367936] Code: Bad RIP value.
[36247.367937] RSP: 002b:000000c0003ef490 EFLAGS: 00000206 ORIG_RAX: 0000000000000089
[36247.367938] RAX: ffffffffffffffda RBX: 000000c00002a000 RCX: 00000000004a5c20
[36247.367939] RDX: 0000000000000000 RSI: 000000c0003ef5c0 RDI: 000000c0004510e0
[36247.367940] RBP: 000000c0003ef4f0 R08: 0000000000000000 R09: 0000000000000000
[36247.367940] R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff
[36247.367941] R13: 0000000000000088 R14: 0000000000000087 R15: 0000000000000100
[36247.367954] INFO: task kworker/1:2:32265 blocked for more than 120 seconds.
[36247.367973]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
[36247.367989] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[36247.368008] kworker/1:2     D    0 32265      2 0x80000000
[36247.368020] Workqueue: cifsiod smb2_reconnect_server [cifs]
[36247.368021] Call Trace:
[36247.368023]  ? __schedule+0x338/0x850
[36247.368025]  schedule+0x3c/0x90
[36247.368026]  schedule_preempt_disabled+0x14/0x20
[36247.368028]  __mutex_lock.isra.7+0x19f/0x540
[36247.368037]  ? SMB2_sess_auth_rawntlmssp_authenticate+0x69/0x170 [cifs]
[36247.368046]  ? SMB2_sess_setup+0x17f/0x2c0 [cifs]
[36247.368054]  ? cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
[36247.368055]  ? _raw_spin_unlock+0x16/0x30
[36247.368062]  cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
[36247.368071]  smb2_reconnect+0x2d6/0x4b0 [cifs]
[36247.368073]  ? _raw_spin_unlock+0x16/0x30
[36247.368074]  ? preempt_count_add+0x67/0xb0
[36247.368083]  smb2_reconnect_server+0x1bb/0x350 [cifs]
[36247.368090]  process_one_work+0x188/0x3a0
[36247.368092]  worker_thread+0x4c/0x3a0
[36247.368094]  ? preempt_count_add+0x67/0xb0
[36247.368095]  ? process_one_work+0x3a0/0x3a0
[36247.368097]  kthread+0xf8/0x130
[36247.368099]  ? kthread_create_worker_on_cpu+0x70/0x70
[36247.368101]  ret_from_fork+0x35/0x40
[36247.368103] INFO: task puppet:3836 blocked for more than 120 seconds.
[36247.368119]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
[36247.368135] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[36247.368158] puppet          D    0  3836    541 0x00000000
[36247.368159] Call Trace:
[36247.368161]  ? __schedule+0x338/0x850
[36247.368163]  schedule+0x3c/0x90
[36247.368164]  schedule_preempt_disabled+0x14/0x20
[36247.368165]  __mutex_lock.isra.7+0x19f/0x540
[36247.368166]  ? __switch_to_asm+0x34/0x70
[36247.368167]  ? __switch_to_asm+0x40/0x70
[36247.368169]  ? try_module_get+0x68/0x100
[36247.368178]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
[36247.368186]  smb2_reconnect+0x1d7/0x4b0 [cifs]
[36247.368188]  ? _raw_spin_unlock_irq+0x1d/0x30
[36247.368189]  ? finish_task_switch+0x7d/0x290
[36247.368190]  ? __switch_to_asm+0x40/0x70
[36247.368198]  smb2_plain_req_init+0x30/0x240 [cifs]
[36247.368199]  ? _raw_spin_lock_irqsave+0x25/0x50
[36247.368208]  SMB2_open_init+0x6d/0x7c0 [cifs]
[36247.368209]  ? _raw_spin_lock+0x13/0x30
[36247.368218]  ? cifs_mid_q_entry_release+0x15/0x40 [cifs]
[36247.368227]  ? SMB2_open+0x150/0x520 [cifs]
[36247.368235]  SMB2_open+0x150/0x520 [cifs]
[36247.368238]  ? sched_clock+0x5/0x10
[36247.368247]  ? open_shroot+0x12f/0x200 [cifs]
[36247.368255]  open_shroot+0x12f/0x200 [cifs]
[36247.368258]  ? __follow_mount_rcu.isra.31+0x3c/0xf0
[36247.368267]  smb2_query_path_info+0x93/0x220 [cifs]
[36247.368269]  ? walk_component+0x48/0x360
[36247.368278]  cifs_get_inode_info+0x580/0xb10 [cifs]
[36247.368286]  ? filename_lookup+0xf8/0x1a0
[36247.368294]  ? build_path_from_dentry_optional_prefix+0x1e9/0x440 [cifs]
[36247.368303]  cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
[36247.368311]  cifs_getattr+0x5b/0x1b0 [cifs]
[36247.368313]  vfs_statx+0x89/0xe0
[36247.368315]  __do_sys_newlstat+0x39/0x70
[36247.368318]  do_syscall_64+0x55/0x100
[36247.368320]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[36247.368321] RIP: 0033:0x7f562d3a0335
[36247.368323] Code: Bad RIP value.
[36247.368324] RSP: 002b:00007ffc6bbc4398 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
[36247.368325] RAX: ffffffffffffffda RBX: 00007ffc6bbc43b0 RCX: 00007f562d3a0335
[36247.368326] RDX: 00007ffc6bbc43b0 RSI: 00007ffc6bbc43b0 RDI: 000055e4e5e1bd50
[36247.368326] RBP: 000055e4e2eb65e0 R08: 0000000000000002 R09: 000055e4e2eb69e0
[36247.368327] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
[36247.368328] R13: 000055e4e2ef7258 R14: 000055e4e2ef8810 R15: 000055e4e2f92ad0
[36368.186922] INFO: task node_exporter:3567 blocked for more than 120 seconds.
[36368.186988]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
[36368.187039] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[36368.187097] node_exporter   D    0  3567      1 0x00000000
[36368.187102] Call Trace:
[36368.187117]  ? __schedule+0x338/0x850
[36368.187123]  ? preempt_count_add+0x67/0xb0
[36368.187128]  schedule+0x3c/0x90
[36368.187133]  schedule_preempt_disabled+0x14/0x20
[36368.187138]  __mutex_lock.isra.7+0x19f/0x540
[36368.187175]  ? try_module_get+0x68/0x100
[36368.187232]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
[36368.187265]  smb2_reconnect+0x1d7/0x4b0 [cifs]
[36368.187300]  smb2_plain_req_init+0x30/0x240 [cifs]
[36368.187304]  ? __switch_to_asm+0x34/0x70
[36368.187307]  ? __switch_to_asm+0x40/0x70
[36368.187337]  SMB2_open_init+0x6d/0x7c0 [cifs]
[36368.187372]  ? smb2_queryfs+0x13d/0x350 [cifs]
[36368.187375]  ? lookup_fast+0xc8/0x2e0
[36368.187405]  smb2_queryfs+0x13d/0x350 [cifs]
[36368.187410]  ? lookup_fast+0xc8/0x2e0
[36368.187416]  ? ___cache_free+0x31/0x2e0
[36368.187421]  ? terminate_walk+0x93/0x100
[36368.187449]  ? cifs_statfs+0xad/0x300 [cifs]
[36368.187471]  cifs_statfs+0xad/0x300 [cifs]
[36368.187477]  statfs_by_dentry+0x6a/0x90
[36368.187482]  vfs_statfs+0x16/0xc0
[36368.187486]  user_statfs+0x50/0xa0
[36368.187491]  __do_sys_statfs+0x20/0x50
[36368.187499]  do_syscall_64+0x55/0x100
[36368.187503]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[36368.187507] RIP: 0033:0x4a5c20
[36368.187517] Code: Bad RIP value.
[36368.187519] RSP: 002b:000000c0003ef490 EFLAGS: 00000206 ORIG_RAX: 0000000000000089
[36368.187523] RAX: ffffffffffffffda RBX: 000000c00002a000 RCX: 00000000004a5c20
[36368.187526] RDX: 0000000000000000 RSI: 000000c0003ef5c0 RDI: 000000c0004510e0
[36368.187528] RBP: 000000c0003ef4f0 R08: 0000000000000000 R09: 0000000000000000
[36368.187530] R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff
[36368.187532] R13: 0000000000000088 R14: 0000000000000087 R15: 0000000000000100
[36368.187572] INFO: task kworker/1:2:32265 blocked for more than 120 seconds.
[36368.187627]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
[36368.187677] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[36368.187735] kworker/1:2     D    0 32265      2 0x80000000
[36368.187771] Workqueue: cifsiod smb2_reconnect_server [cifs]
[36368.187773] Call Trace:
[36368.187780]  ? __schedule+0x338/0x850
[36368.187785]  schedule+0x3c/0x90
[36368.187790]  schedule_preempt_disabled+0x14/0x20
[36368.187795]  __mutex_lock.isra.7+0x19f/0x540
[36368.187826]  ? SMB2_sess_auth_rawntlmssp_authenticate+0x69/0x170 [cifs]
[36368.187855]  ? SMB2_sess_setup+0x17f/0x2c0 [cifs]
[36368.187880]  ? cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
[36368.187885]  ? _raw_spin_unlock+0x16/0x30
[36368.187908]  cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
[36368.187936]  smb2_reconnect+0x2d6/0x4b0 [cifs]
[36368.187942]  ? _raw_spin_unlock+0x16/0x30
[36368.187947]  ? preempt_count_add+0x67/0xb0
[36368.187976]  smb2_reconnect_server+0x1bb/0x350 [cifs]
[36368.187984]  process_one_work+0x188/0x3a0
[36368.187990]  worker_thread+0x4c/0x3a0
[36368.187994]  ? preempt_count_add+0x67/0xb0
[36368.187998]  ? process_one_work+0x3a0/0x3a0
[36368.188001]  kthread+0xf8/0x130
[36368.188005]  ? kthread_create_worker_on_cpu+0x70/0x70
[36368.188009]  ret_from_fork+0x35/0x40
[36368.188014] INFO: task puppet:3836 blocked for more than 120 seconds.
[36368.188064]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
[36368.188114] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[36368.188182] puppet          D    0  3836    541 0x00000000
[36368.188186] Call Trace:
[36368.188191]  ? __schedule+0x338/0x850
[36368.188196]  schedule+0x3c/0x90
[36368.188200]  schedule_preempt_disabled+0x14/0x20
[36368.188205]  __mutex_lock.isra.7+0x19f/0x540
[36368.188208]  ? __switch_to_asm+0x34/0x70
[36368.188211]  ? __switch_to_asm+0x40/0x70
[36368.188215]  ? try_module_get+0x68/0x100
[36368.188244]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
[36368.188271]  smb2_reconnect+0x1d7/0x4b0 [cifs]
[36368.188277]  ? _raw_spin_unlock_irq+0x1d/0x30
[36368.188280]  ? finish_task_switch+0x7d/0x290
[36368.188283]  ? __switch_to_asm+0x40/0x70
[36368.188310]  smb2_plain_req_init+0x30/0x240 [cifs]
[36368.188314]  ? _raw_spin_lock_irqsave+0x25/0x50
[36368.188342]  SMB2_open_init+0x6d/0x7c0 [cifs]
[36368.188346]  ? _raw_spin_lock+0x13/0x30
[36368.188376]  ? cifs_mid_q_entry_release+0x15/0x40 [cifs]
[36368.188405]  ? SMB2_open+0x150/0x520 [cifs]
[36368.188432]  SMB2_open+0x150/0x520 [cifs]
[36368.188440]  ? sched_clock+0x5/0x10
[36368.188470]  ? open_shroot+0x12f/0x200 [cifs]
[36368.188497]  open_shroot+0x12f/0x200 [cifs]
[36368.188503]  ? __follow_mount_rcu.isra.31+0x3c/0xf0
[36368.188534]  smb2_query_path_info+0x93/0x220 [cifs]
[36368.188539]  ? walk_component+0x48/0x360
[36368.188569]  cifs_get_inode_info+0x580/0xb10 [cifs]
[36368.188574]  ? filename_lookup+0xf8/0x1a0
[36368.188601]  ? build_path_from_dentry_optional_prefix+0x1e9/0x440 [cifs]
[36368.188630]  cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
[36368.188659]  cifs_getattr+0x5b/0x1b0 [cifs]
[36368.188664]  vfs_statx+0x89/0xe0
[36368.188670]  __do_sys_newlstat+0x39/0x70
[36368.188677]  do_syscall_64+0x55/0x100
[36368.188681]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[36368.188684] RIP: 0033:0x7f562d3a0335
[36368.188689] Code: Bad RIP value.
[36368.188691] RSP: 002b:00007ffc6bbc4398 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
[36368.188695] RAX: ffffffffffffffda RBX: 00007ffc6bbc43b0 RCX: 00007f562d3a0335
[36368.188696] RDX: 00007ffc6bbc43b0 RSI: 00007ffc6bbc43b0 RDI: 000055e4e5e1bd50
[36368.188698] RBP: 000055e4e2eb65e0 R08: 0000000000000002 R09: 000055e4e2eb69e0
[36368.188700] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
[36368.188702] R13: 000055e4e2ef7258 R14: 000055e4e2ef8810 R15: 000055e4e2f92ad0
[36489.006262] INFO: task node_exporter:3567 blocked for more than 120 seconds.
[36489.006300]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
[36489.006329] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[36489.006458] node_exporter   D    0  3567      1 0x00000000
[36489.006461] Call Trace:
[36489.006470]  ? __schedule+0x338/0x850
[36489.006474]  ? preempt_count_add+0x67/0xb0
[36489.006476]  schedule+0x3c/0x90
[36489.006478]  schedule_preempt_disabled+0x14/0x20
[36489.006480]  __mutex_lock.isra.7+0x19f/0x540
[36489.006485]  ? try_module_get+0x68/0x100
[36489.006515]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
[36489.006531]  smb2_reconnect+0x1d7/0x4b0 [cifs]
[36489.006547]  smb2_plain_req_init+0x30/0x240 [cifs]
[36489.006549]  ? __switch_to_asm+0x34/0x70
[36489.006550]  ? __switch_to_asm+0x40/0x70
[36489.006565]  SMB2_open_init+0x6d/0x7c0 [cifs]
[36489.006581]  ? smb2_queryfs+0x13d/0x350 [cifs]
[36489.006583]  ? lookup_fast+0xc8/0x2e0
[36489.006597]  smb2_queryfs+0x13d/0x350 [cifs]
[36489.006599]  ? lookup_fast+0xc8/0x2e0
[36489.006602]  ? ___cache_free+0x31/0x2e0
[36489.006604]  ? terminate_walk+0x93/0x100
[36489.006619]  ? cifs_statfs+0xad/0x300 [cifs]
[36489.006632]  cifs_statfs+0xad/0x300 [cifs]
[36489.006635]  statfs_by_dentry+0x6a/0x90
[36489.006637]  vfs_statfs+0x16/0xc0
[36489.006639]  user_statfs+0x50/0xa0
[36489.006641]  __do_sys_statfs+0x20/0x50
[36489.006645]  do_syscall_64+0x55/0x100
[36489.006647]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[36489.006649] RIP: 0033:0x4a5c20
[36489.006654] Code: Bad RIP value.
[36489.006655] RSP: 002b:000000c0003ef490 EFLAGS: 00000206 ORIG_RAX: 0000000000000089
[36489.006657] RAX: ffffffffffffffda RBX: 000000c00002a000 RCX: 00000000004a5c20
[36489.006658] RDX: 0000000000000000 RSI: 000000c0003ef5c0 RDI: 000000c0004510e0
[36489.006659] RBP: 000000c0003ef4f0 R08: 0000000000000000 R09: 0000000000000000
[36489.006660] R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff
[36489.006661] R13: 0000000000000088 R14: 0000000000000087 R15: 0000000000000100
[36489.006675] INFO: task kworker/1:2:32265 blocked for more than 120 seconds.
[36489.006812]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
[36489.006940] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[36489.007076] kworker/1:2     D    0 32265      2 0x80000000
[36489.007100] Workqueue: cifsiod smb2_reconnect_server [cifs]
[36489.007101] Call Trace:
[36489.007105]  ? __schedule+0x338/0x850
[36489.007107]  schedule+0x3c/0x90
[36489.007109]  schedule_preempt_disabled+0x14/0x20
[36489.007111]  __mutex_lock.isra.7+0x19f/0x540
[36489.007124]  ? SMB2_sess_auth_rawntlmssp_authenticate+0x69/0x170 [cifs]
[36489.007134]  ? SMB2_sess_setup+0x17f/0x2c0 [cifs]
[36489.007141]  ? cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
[36489.007143]  ? _raw_spin_unlock+0x16/0x30
[36489.007150]  cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
[36489.007159]  smb2_reconnect+0x2d6/0x4b0 [cifs]
[36489.007161]  ? _raw_spin_unlock+0x16/0x30
[36489.007163]  ? preempt_count_add+0x67/0xb0
[36489.007172]  smb2_reconnect_server+0x1bb/0x350 [cifs]
[36489.007176]  process_one_work+0x188/0x3a0
[36489.007178]  worker_thread+0x4c/0x3a0
[36489.007179]  ? preempt_count_add+0x67/0xb0
[36489.007180]  ? process_one_work+0x3a0/0x3a0
[36489.007181]  kthread+0xf8/0x130
[36489.007183]  ? kthread_create_worker_on_cpu+0x70/0x70
[36489.007184]  ret_from_fork+0x35/0x40
[36489.007186] INFO: task puppet:3836 blocked for more than 120 seconds.
[36489.007294]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
[36489.007409] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[36489.007540] puppet          D    0  3836    541 0x00000000
[36489.007541] Call Trace:
[36489.007545]  ? __schedule+0x338/0x850
[36489.007548]  schedule+0x3c/0x90
[36489.007550]  schedule_preempt_disabled+0x14/0x20
[36489.007552]  __mutex_lock.isra.7+0x19f/0x540
[36489.007553]  ? __switch_to_asm+0x34/0x70
[36489.007555]  ? __switch_to_asm+0x40/0x70
[36489.007558]  ? try_module_get+0x68/0x100
[36489.007580]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
[36489.007594]  smb2_reconnect+0x1d7/0x4b0 [cifs]
[36489.007596]  ? _raw_spin_unlock_irq+0x1d/0x30
[36489.007598]  ? finish_task_switch+0x7d/0x290
[36489.007600]  ? __switch_to_asm+0x40/0x70
[36489.007613]  smb2_plain_req_init+0x30/0x240 [cifs]
[36489.007615]  ? _raw_spin_lock_irqsave+0x25/0x50
[36489.007628]  SMB2_open_init+0x6d/0x7c0 [cifs]
[36489.007630]  ? _raw_spin_lock+0x13/0x30
[36489.007644]  ? cifs_mid_q_entry_release+0x15/0x40 [cifs]
[36489.007658]  ? SMB2_open+0x150/0x520 [cifs]
[36489.007672]  SMB2_open+0x150/0x520 [cifs]
[36489.007677]  ? sched_clock+0x5/0x10
[36489.007692]  ? open_shroot+0x12f/0x200 [cifs]
[36489.007707]  open_shroot+0x12f/0x200 [cifs]
[36489.007711]  ? __follow_mount_rcu.isra.31+0x3c/0xf0
[36489.007727]  smb2_query_path_info+0x93/0x220 [cifs]
[36489.007729]  ? walk_component+0x48/0x360
[36489.007745]  cifs_get_inode_info+0x580/0xb10 [cifs]
[36489.007748]  ? filename_lookup+0xf8/0x1a0
[36489.007762]  ? build_path_from_dentry_optional_prefix+0x1e9/0x440 [cifs]
[36489.007778]  cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
[36489.007791]  cifs_getattr+0x5b/0x1b0 [cifs]
[36489.007794]  vfs_statx+0x89/0xe0
[36489.007797]  __do_sys_newlstat+0x39/0x70
[36489.007801]  do_syscall_64+0x55/0x100
[36489.007802]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[36489.007805] RIP: 0033:0x7f562d3a0335
[36489.007808] Code: Bad RIP value.
[36489.007809] RSP: 002b:00007ffc6bbc4398 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
[36489.007811] RAX: ffffffffffffffda RBX: 00007ffc6bbc43b0 RCX: 00007f562d3a0335
[36489.007812] RDX: 00007ffc6bbc43b0 RSI: 00007ffc6bbc43b0 RDI: 000055e4e5e1bd50
[36489.007813] RBP: 000055e4e2eb65e0 R08: 0000000000000002 R09: 000055e4e2eb69e0
[36489.007814] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
[36489.007815] R13: 000055e4e2ef7258 R14: 000055e4e2ef8810 R15: 000055e4e2f92ad0
[36609.825705] INFO: task node_exporter:3567 blocked for more than 120 seconds.
[36609.825836]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
[36609.825869] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[36609.825908] node_exporter   D    0  3567      1 0x00000000
[36609.825910] Call Trace:
[36609.825919]  ? __schedule+0x338/0x850
[36609.825922]  ? preempt_count_add+0x67/0xb0
[36609.825925]  schedule+0x3c/0x90
[36609.825927]  schedule_preempt_disabled+0x14/0x20
[36609.825929]  __mutex_lock.isra.7+0x19f/0x540
[36609.825933]  ? try_module_get+0x68/0x100
[36609.825960]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
[36609.825979]  smb2_reconnect+0x1d7/0x4b0 [cifs]
[36609.825999]  smb2_plain_req_init+0x30/0x240 [cifs]
[36609.826001]  ? __switch_to_asm+0x34/0x70
[36609.826002]  ? __switch_to_asm+0x40/0x70
[36609.826020]  SMB2_open_init+0x6d/0x7c0 [cifs]
[36609.826039]  ? smb2_queryfs+0x13d/0x350 [cifs]
[36609.826040]  ? lookup_fast+0xc8/0x2e0
[36609.826057]  smb2_queryfs+0x13d/0x350 [cifs]
[36609.826060]  ? lookup_fast+0xc8/0x2e0
[36609.826064]  ? ___cache_free+0x31/0x2e0
[36609.826066]  ? terminate_walk+0x93/0x100
[36609.826084]  ? cifs_statfs+0xad/0x300 [cifs]
[36609.826099]  cifs_statfs+0xad/0x300 [cifs]
[36609.826103]  statfs_by_dentry+0x6a/0x90
[36609.826104]  vfs_statfs+0x16/0xc0
[36609.826106]  user_statfs+0x50/0xa0
[36609.826108]  __do_sys_statfs+0x20/0x50
[36609.826113]  do_syscall_64+0x55/0x100
[36609.826115]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[36609.826117] RIP: 0033:0x4a5c20
[36609.826123] Code: Bad RIP value.
[36609.826124] RSP: 002b:000000c0003ef490 EFLAGS: 00000206 ORIG_RAX: 0000000000000089
[36609.826126] RAX: ffffffffffffffda RBX: 000000c00002a000 RCX: 00000000004a5c20
[36609.826127] RDX: 0000000000000000 RSI: 000000c0003ef5c0 RDI: 000000c0004510e0
[36609.826128] RBP: 000000c0003ef4f0 R08: 0000000000000000 R09: 0000000000000000
[36609.826129] R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff
[36609.826130] R13: 0000000000000088 R14: 0000000000000087 R15: 0000000000000100
[73117.654990] CIFS VFS: Send error in SessSetup = -126

Regards, Martijn

> 
> --
> Best regards,
> Pavel Shilovsky
> 
> чт, 4 июл. 2019 г. в 06:36, Martijn de Gouw
> <martijn.de.gouw@prodrive-technologies.com>:
>>
>> Hi,
>>
>> On 04-07-2019 13:22, Aurélien Aptel wrote:
>>> Martijn de Gouw <martijn.de.gouw@prodrive-technologies.com> writes:
>>>>> Are there any kernel oops/panic with stack traces and register dumps in
>>>>> the log?
>>>>>
>>>>> You can inspect the kernel stack trace of the hung processes (to see where
>>>>> they are stuck) by printing the file /proc/<pid>/stack.
>>>>
>>>> These are the stacks of all processes that are D, most of them being df.
>>>> I also attached the cifs Stats output below.
>>>
>>> Ok thanks. What about Oops or BUG or panic in dmesg logs, did you see
>>> any?
>>
>> I did find the following messages in the dmesg of one of the servers:
>>
>> [    4.797893] FS-Cache: Duplicate cookie detected
>> [    4.797915] FS-Cache: O-cookie c=000000001a791554 [p=00000000fb6f31ee fl=222 nc=0 na=1]
>> [    4.797934] FS-Cache: O-cookie d=0000000020a06fab n=00000000654600e7
>> [    4.797949] FS-Cache: O-key=[8] '020001bd0a010102'
>> [    4.797963] FS-Cache: N-cookie c=000000005d0bf4a5 [p=00000000fb6f31ee fl=2 nc=0 na=1]
>> [    4.797982] FS-Cache: N-cookie d=0000000020a06fab n=000000004e1e47aa
>> [    4.797997] FS-Cache: N-key=[8] '020001bd0a010102'
>> [    4.798643] FS-Cache: Duplicate cookie detected
>> [    4.798659] FS-Cache: O-cookie c=000000001a791554 [p=00000000fb6f31ee fl=222 nc=0 na=1]
>> [    4.798679] FS-Cache: O-cookie d=0000000020a06fab n=00000000654600e7
>> [    4.798695] FS-Cache: O-key=[8] '020001bd0a010102'
>> [    4.798714] FS-Cache: N-cookie c=00000000cbe44971 [p=00000000fb6f31ee fl=2 nc=0 na=1]
>> [    4.798733] FS-Cache: N-cookie d=0000000020a06fab n=00000000ab0e78a6
>> [    4.798747] FS-Cache: N-key=[8] '020001bd0a010102'
>> [    4.906667] FS-Cache: Netfs 'nfs' registered for caching
>> [12738.729173] CIFS VFS: Send error in SessSetup = -126
>> [99125.480751] CIFS VFS: Send error in SessSetup = -126
>> [185517.295175] CIFS VFS: Send error in SessSetup = -126
>> [250515.749714] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
>> [250515.749740] BUG: unable to handle kernel paging request at ffff8ec52a6fe1d0
>> [250515.749757] PGD 1b2602067 P4D 1b2602067 PUD 42dbff063 PMD 42a357063 PTE 800000042a6fe063
>> [250515.749779] Oops: 0011 [#1] PREEMPT SMP PTI
>> [250515.749792] CPU: 1 PID: 796 Comm: cifsd Tainted: G            E     4.20.17-pd-4.20.y #20190611
>> [250515.749812] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/19/2018
>> [250515.749844] RIP: 0010:0xffff8ec52a6fe1d0
>> [250515.749860] Code: ff ff 00 00 00 00 fd 01 00 00 00 7d 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> e1 6f 2a c5 8e ff ff d0 e1 6f 2a c5 8e ff ff 80 70 82 2b c5 8e
>> [250515.749914] RSP: 0018:ffff9e60c2e2bdc0 EFLAGS: 00010202
>> [250515.749927] RAX: ffff8ec52a6fe1d0 RBX: ffff8ec526fce800 RCX: dead000000000200
>> [250515.749944] RDX: ffff8ec52d3e6800 RSI: 0000000000000246 RDI: ffff8ec52a6fe198
>> [250515.749962] RBP: ffff8ec52d3e6800 R08: 0000000000000002 R09: ffff8ec52a6fe170
>> [250515.749979] R10: ffff9e60c2e2bc10 R11: ffff8ec527088000 R12: ffff8ec526fce9c0
>> [250515.749997] R13: ffff8ec52d3e6800 R14: ffff9e60c2e2bdd8 R15: ffff8ec52d3e6b80
>> [250515.750014] FS:  0000000000000000(0000) GS:ffff8ec52fa80000(0000) knlGS:0000000000000000
>> [250515.750033] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [250515.750048] CR2: ffff8ec52a6fe1d0 CR3: 000000042a53c005 CR4: 00000000003606e0
>> [250515.750100] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [250515.750102] CIFS VFS: No task to wake, unknown frame received! NumMids 3
>> [250515.750119] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [250515.750120] Call Trace:
>> [250515.750149] 00000000: 424d53fe 00010040 00000000 00000005  .SMB@...........
>> [250515.750193]  ? cifs_reconnect+0x337/0x880 [cifs]
>> [250515.750201]  ? cifs_handle_standard+0x169/0x190 [cifs]
>> [250515.750216] 00000010: 10000009 00000098 00000db2 00000000  ................
>> [250515.750234]  ? cifs_demultiplex_thread+0x9e5/0xbc0 [cifs]
>> [250515.750240] 00000020: 0000025c 00000001 640002e9 0000c81d  \..........d....
>> [250515.750259]  ? finish_task_switch+0x7d/0x290
>> [250515.750271] 00000030: 5a9e63f3 6204a9b0 587058e3 0d45419b  .c.Z...b.XpX.AE.
>> [250515.750295]  ? cifs_handle_standard+0x190/0x190 [cifs]
>> [250515.750728] CIFS VFS: No task to wake, unknown frame received! NumMids 3
>> [250515.751200]  ? kthread+0xf8/0x130
>> [250515.751639] 00000000: 424d53fe 00010040 00000000 00000010  .SMB@...........
>> [250515.752102]  ? kthread_create_worker_on_cpu+0x70/0x70
>> [250515.752554] 00000010: 0000000d 00000068 00000db3 00000000  ....h...........
>> [250515.752997]  ? ret_from_fork+0x35/0x40
>> [250515.753429] 00000020: 0000025c 00000001 640002e9 0000c81d  \..........d....
>> [250515.753874] Modules linked in: cpufreq_userspace(E) cpufreq_powersave(E) cpufreq_conservative(E) arc4(E) ecb(E) md4(E) nfsv3(E) nfs_acl(E) nfs(E) sha512_ssse3(E) sha512_generic(E) lockd(E) cmac(E) grace(E) hmac(E) nls_utf8(E) cifs(E) ccm(E) dns_resolver(E) fscache(E) vmw_vsock_vmci_transport(E) vsock(E) sb_edac(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) vmw_balloon(E) aes_x86_64(E) joydev(E) evdev(E) crypto_simd(E) vmwgfx(E) serio_raw(E) cryptd(E) glue_helper(E) ttm(E) sg(E) vmw_vmci(E) drm_kms_helper(E) drm(E) button(E) ac(E) auth_rpcgss(E) sunrpc(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) ata_generic(E) sd_mod(E) ata_piix(E) crc32c_intel(E) psmouse(E) libata(E) vmw_pvscsi(E) vmxnet3(E) scsi_mod(E) i2c_piix4(E)
>> [250515.754336] 00000030: 49d4fd21 858665a2 fde5288f 01d2d919  !..I.e...(......
>> [250515.754766] CR2: ffff8ec52a6fe1d0
>> [250515.758927] CIFS VFS: No task to wake, unknown frame received! NumMids 3
>> [250515.759389] ---[ end trace 92ea62cd080150de ]---
>> [250515.759879] 00000000: 424d53fe 00010040 00000000 00030006  .SMB@...........
>> [250515.760357] RIP: 0010:0xffff8ec52a6fe1d0
>> [250515.761841] Code: ff ff 00 00 00 00 fd 01 00 00 00 7d 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> e1 6f 2a c5 8e ff ff d0 e1 6f 2a c5 8e ff ff 80 70 82 2b c5 8e
>> [250515.763287] RSP: 0018:ffff9e60c2e2bdc0 EFLAGS: 00010202
>> [250515.763778] RAX: ffff8ec52a6fe1d0 RBX: ffff8ec526fce800 RCX: dead000000000200
>> [250515.764060] 00000010: 0000000d 00000000 00000db4 00000000  ................
>> [250515.764310] RDX: ffff8ec52d3e6800 RSI: 0000000000000246 RDI: ffff8ec52a6fe198
>> [250515.764311] RBP: ffff8ec52d3e6800 R08: 0000000000000002 R09: ffff8ec52a6fe170
>> [250515.764312] R10: ffff9e60c2e2bc10 R11: ffff8ec527088000 R12: ffff8ec526fce9c0
>> [250515.764314] R13: ffff8ec52d3e6800 R14: ffff9e60c2e2bdd8 R15: ffff8ec52d3e6b80
>> [250515.765491] 00000020: 0000025c 00000001 640002e9 0000c81d  \..........d....
>> [250515.765836] FS:  0000000000000000(0000) GS:ffff8ec52fa80000(0000) knlGS:0000000000000000
>> [250515.767899] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [250515.768367] CR2: ffff8ec52a6fe1d0 CR3: 000000042a53c005 CR4: 00000000003606e0
>> [250515.768919] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [250515.769419] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [250515.769591] 00000030: 879543e1 572c0574 e53861ba 97ccc40b  .C..t.,W.a8.....
>> [271909.743603] CIFS VFS: Send error in SessSetup = -126
>> [358301.908585] CIFS VFS: Send error in SessSetup = -126
>> [444693.566453] CIFS VFS: Send error in SessSetup = -126
>> [531086.090040] CIFS VFS: Send error in SessSetup = -126
>> [617477.785390] CIFS VFS: Send error in SessSetup = -126
>> [703869.556705] CIFS VFS: Send error in SessSetup = -126
>> [790261.656853] CIFS VFS: Send error in SessSetup = -126
>> [876653.496928] CIFS VFS: Send error in SessSetup = -126
>> [963045.816742] CIFS VFS: Send error in SessSetup = -126
>> [1049437.566219] CIFS VFS: Send error in SessSetup = -126
>>
>>
>> And from another server:
>> [    4.253091] FS-Cache: Duplicate cookie detected
>> [    4.253120] FS-Cache: O-cookie c=000000004cc29d26 [p=0000000017dbbbc0 fl=222 nc=0 na=1]
>> [    4.253153] FS-Cache: O-cookie d=00000000e8e68765 n=0000000012869fa7
>> [    4.253179] FS-Cache: O-key=[8] '020001bd0a010102'
>> [    4.253201] FS-Cache: N-cookie c=00000000c3bbbddd [p=0000000017dbbbc0 fl=2 nc=0 na=1]
>> [    4.253235] FS-Cache: N-cookie d=00000000e8e68765 n=00000000335882b3
>> [    4.253262] FS-Cache: N-key=[8] '020001bd0a010102'
>> [    4.254107] FS-Cache: Duplicate cookie detected
>> [    4.254130] FS-Cache: O-cookie c=000000004cc29d26 [p=0000000017dbbbc0 fl=222 nc=0 na=1]
>> [    4.254161] FS-Cache: O-cookie d=00000000e8e68765 n=0000000012869fa7
>> [    4.254185] FS-Cache: O-key=[8] '020001bd0a010102'
>> [    4.254230] FS-Cache: N-cookie c=000000000ec2f0bb [p=0000000017dbbbc0 fl=2 nc=0 na=1]
>> [    4.254261] FS-Cache: N-cookie d=00000000e8e68765 n=0000000024706210
>> [    4.254285] FS-Cache: N-key=[8] '020001bd0a010102'
>> [    4.329147] CIFS VFS: BAD_NETWORK_NAME: \\stor02.bk.prodrive.nl\userdata$
>> [    4.330107] CIFS VFS: cifs_mount failed w/return code = -2
>> [65206.127542] CIFS VFS: Send error in SessSetup = -126
>> [151597.808064] CIFS VFS: Send error in SessSetup = -126
>> [237989.956447] CIFS VFS: Send error in SessSetup = -126
>> [324380.937984] CIFS VFS: Send error in SessSetup = -126
>> [402750.869518] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
>> [402750.869594] BUG: unable to handle kernel paging request at ffff909bf61609d0
>> [402750.869650] PGD 1ac02067 P4D 1ac02067 PUD 1385f5063 PMD 136142063 PTE 8000000136160063
>> [402750.869716] Oops: 0011 [#1] PREEMPT SMP PTI
>> [402750.869753] CPU: 0 PID: 797 Comm: cifsd Tainted: G            E     4.20.17-pd-4.20.y #20190611
>> [402750.869818] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/19/2018
>> [402750.869926] CIFS VFS: No task to wake, unknown frame received! NumMids 3
>> [402750.869947] RIP: 0010:0xffff909bf61609d0
>> [402750.870013] 00000000: 424d53fe 00010040 00000000 00000005  .SMB@...........
>> [402750.870198] Code: ff ff 00 00 00 00 4e 01 00 00 00 7d 00 00 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> 09 16 f6 9b 90 ff ff d0 09 16 f6 9b 90 ff ff c0 40 1f f2 9b 90
>> [402750.870275] 00000010: 10000009 00000098 000002b5 00000000  ................
>> [402750.870450] RSP: 0000:ffffa18d41adbd28 EFLAGS: 00010202
>> [402750.870469] 00000020: 000001cd 00000001 44000cd9 0000c824  ...........D$...
>> [402750.870509] RAX: ffff909bf61609d0 RBX: ffff909bf6c3f800 RCX: dead000000000200
>> [402750.870511] RDX: ffff909bf74fa980 RSI: 0000000000000246 RDI: ffff909bf6160998
>> [402750.870530] 00000030: 79955e4a 0dacb8eb 025edebd 4efa7788  J^.y......^..w.N
>> [402750.870582] RBP: ffff909bf74fa980 R08: 0000000000000000 R09: ffff909bf6c3f970
>> [402750.870585] R10: ffffa18d406a7cc0 R11: 0000000000000000 R12: ffff909bf6c3f9c0
>> [402750.870605] CIFS VFS: No task to wake, unknown frame received! NumMids 3
>> [402750.870656] R13: ffff909bf74fa980 R14: ffffa18d41adbd40 R15: ffff909bf74fa100
>> [402750.870660] FS:  0000000000000000(0000) GS:ffff909bf9c00000(0000) knlGS:0000000000000000
>> [402750.870677] 00000000: 424d53fe 00010040 00000000 00000010  .SMB@...........
>> [402750.870731] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [402750.870733] CR2: ffff909bf61609d0 CR3: 0000000136e56004 CR4: 00000000003606f0
>> [402750.870752] 00000010: 0000000d 00000068 000002b6 00000000  ....h...........
>> [402750.870888] 00000020: 000001cd 00000001 44000cd9 0000c824  ...........D$...
>> [402750.870894] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [402750.870907] 00000030: f063942a cae9f6e1 d5327c26 91fc6f33  *.c.....&|2.3o..
>> [402750.872321] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [402750.872324] Call Trace:
>> [402750.872381]  ? cifs_reconnect+0x337/0x880 [cifs]
>> [402750.872410]  ? cifs_readv_from_socket+0x211/0x260 [cifs]
>> [402750.875687]  ? cifs_read_from_socket+0x4a/0x70 [cifs]
>> [402750.876128]  ? _raw_spin_unlock_irqrestore+0x20/0x40
>> [402750.876555]  ? try_to_wake_up+0x54/0x540
>> [402750.876991]  ? cifs_small_buf_get+0x16/0x20 [cifs]
>> [402750.877426]  ? cifs_demultiplex_thread+0xdd/0xbc0 [cifs]
>> [402750.877834]  ? finish_task_switch+0x7d/0x290
>> [402750.878252]  ? cifs_handle_standard+0x190/0x190 [cifs]
>> [402750.878651]  ? kthread+0xf8/0x130
>> [402750.879034]  ? kthread_create_worker_on_cpu+0x70/0x70
>> [402750.879412]  ? ret_from_fork+0x35/0x40
>> [402750.879786] Modules linked in: cpufreq_conservative(E) cpufreq_powersave(E) cpufreq_userspace(E) arc4(E) ecb(E) md4(E) sha512_ssse3(E) sha512_generic(E) cmac(E) hmac(E) nfsv3(E) nfs_acl(E) nls_utf8(E) nfs(E) lockd(E) grace(E) cifs(E) ccm(E) dns_resolver(E) fscache(E) sb_edac(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) aes_x86_64(E) crypto_simd(E) joydev(E) cryptd(E) vmw_balloon(E) evdev(E) glue_helper(E) serio_raw(E) vmwgfx(E) ttm(E) drm_kms_helper(E) sg(E) drm(E) vmw_vsock_vmci_transport(E) vsock(E) vmw_vmci(E) ac(E) button(E) auth_rpcgss(E) sunrpc(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sd_mod(E) ata_generic(E) ata_piix(E) vmw_pvscsi(E) libata(E) crc32c_intel(E) psmouse(E) vmxnet3(E) i2c_piix4(E) scsi_mod(E)
>> [402750.883384] CR2: ffff909bf61609d0
>> [402750.883783] ---[ end trace 08b06875e82513eb ]---
>> [402750.884200] RIP: 0010:0xffff909bf61609d0
>> [402750.884598] Code: ff ff 00 00 00 00 52 01 00 00 00 7d 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> 09 16 f6 9b 90 ff ff d0 09 16 f6 9b 90 ff ff c0 40 1f f2 9b 90
>> [402750.885822] RSP: 0000:ffffa18d41adbd28 EFLAGS: 00010202
>> [402750.886229] RAX: ffff909bf61609d0 RBX: ffff909bf6c3f800 RCX: dead000000000200
>> [402750.886651] RDX: ffff909bf74fa980 RSI: 0000000000000246 RDI: ffff909bf6160998
>> [402750.887065] RBP: ffff909bf74fa980 R08: 0000000000000000 R09: ffff909bf6c3f970
>> [402750.887478] R10: ffffa18d406a7cc0 R11: 0000000000000000 R12: ffff909bf6c3f9c0
>> [402750.887883] R13: ffff909bf74fa980 R14: ffffa18d41adbd40 R15: ffff909bf74fa100
>> [402750.888285] FS:  0000000000000000(0000) GS:ffff909bf9c00000(0000) knlGS:0000000000000000
>> [402750.888692] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [402750.889093] CR2: ffff909bf61609d0 CR3: 0000000136e56004 CR4: 00000000003606f0
>> [402750.889559] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [402750.889972] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [410772.901264] CIFS VFS: Send error in SessSetup = -126
>> [497165.194662] CIFS VFS: Send error in SessSetup = -126
>> [583557.492304] CIFS VFS: Send error in SessSetup = -126
>> [669948.937016] CIFS VFS: Send error in SessSetup = -126
>> [756341.390112] CIFS VFS: Send error in SessSetup = -126
>> [842733.557002] CIFS VFS: Send error in SessSetup = -126
>> [929125.303428] CIFS VFS: Send error in SessSetup = -126
>> [1015516.629380] CIFS VFS: Send error in SessSetup = -126
>> [1101908.464697] CIFS VFS: Send error in SessSetup = -126
>> [1188300.531261] CIFS VFS: Send error in SessSetup = -126
>> [1274692.583049] CIFS VFS: Send error in SessSetup = -126
>>
>>
>>>
>>> The individual stack dumps are pretty useful. Here is my theory:
>>>
>>>> pid: 9505
>>>> syscall: 4 0x56550a2ec470 0x7ffede42e9a0 0x7ffede42e9a0 0x83a 0x3 0x20
>>>> 0x7ffede42e8f8 0x7f7f8928f295
>>>> [<0>] open_shroot+0x43/0x200 [cifs]
>>>> [<0>] smb2_query_path_info+0x93/0x220 [cifs]
>>>
>>> Almost all of the processes have the same stack trace. They are stuck at
>>> open_shroot()+0x43 which is probably
>>>
>>>       mutex_lock(&tcon->crfid.fid_mutex);
>>>
>>> Then there are only 2 other processes stuck somewhere in the same code path
>>> (open_shroot) but deeper, meaning they have the locks that the other
>>> processes are waiting for:
>>>
>>>
>>>> pid: 22858
>>>> syscall: 4 0x564b46285d10 0x7ffcea3f9a80 0x7ffcea3f9a80 0x83a 0x3 0x20
>>>> 0x7ffcea3f99d8 0x7f6cc78c7295
>>>> [<0>] cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
>>>> [<0>] smb2_reconnect+0x2d6/0x4b0 [cifs]
>>>> [<0>] smb2_plain_req_init+0x30/0x240 [cifs]
>>>> [<0>] SMB2_open_init+0x6d/0x7c0 [cifs]
>>>> [<0>] SMB2_open+0x150/0x520 [cifs]
>>>> [<0>] open_shroot+0x12f/0x200 [cifs]
>>>> [<0>] smb2_query_path_info+0x93/0x220 [cifs]
>>>> [<0>] cifs_get_inode_info+0x580/0xb10 [cifs]
>>>> [<0>] cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
>>>> [<0>] cifs_getattr+0x5b/0x1b0 [cifs]
>>>> [<0>] vfs_statx+0x89/0xe0
>>>> [<0>] __do_sys_newstat+0x39/0x70
>>>> [<0>] do_syscall_64+0x55/0x100
>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>> [<0>] 0xffffffffffffffff
>>>
>>>
>>>> pid: 20027
>>>> syscall: 4 0x55a3c7d767d0 0x7ffe51432ab0 0x7ffe51432ab0 0x83a
>>>> 0x55a3c7d75c40 0x20 0x7ffe51432a08 0x7f5f7c4e7295
>>>> [<0>] cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
>>>> [<0>] smb2_reconnect+0x2d6/0x4b0 [cifs]
>>>> [<0>] smb2_plain_req_init+0x30/0x240 [cifs]
>>>> [<0>] SMB2_open_init+0x6d/0x7c0 [cifs]
>>>> [<0>] SMB2_open+0x150/0x520 [cifs]
>>>> [<0>] open_shroot+0x12f/0x200 [cifs]
>>>> [<0>] smb2_query_path_info+0x93/0x220 [cifs]
>>>> [<0>] cifs_get_inode_info+0x580/0xb10 [cifs]
>>>> [<0>] cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
>>>> [<0>] cifs_getattr+0x5b/0x1b0 [cifs]
>>>> [<0>] vfs_statx+0x89/0xe0
>>>> [<0>] __do_sys_newstat+0x39/0x70
>>>> [<0>] do_syscall_64+0x55/0x100
>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>> [<0>] 0xffffffffffffffff
>>>
>>> Due to timeouts maybe the Open request needs
>>> to reconnect the server/ses/tcon and to do this it calls
>>> cifs_mark_open_files_invalid() and gets stuck somewhere there.
>>>
>>>           spin_lock(&tcon->open_file_lock);
>>>           list_for_each_safe(tmp, tmp1, &tcon->openFileList) {
>>>                   open_file = list_entry(tmp, struct cifsFileInfo, tlist);
>>>                   open_file->invalidHandle = true;
>>>                   open_file->oplock_break_cancelled = true;
>>>           }
>>>           spin_unlock(&tcon->open_file_lock);
>>>
>>>           mutex_lock(&tcon->crfid.fid_mutex); <=== most likely here
>>>           tcon->crfid.is_valid = false;
>>>           memset(tcon->crfid.fid, 0, sizeof(struct cifs_fid));
>>>           mutex_unlock(&tcon->crfid.fid_mutex);
>>>
>>> I think these processes are trying to lock the same lock twice: one in
>>> open_shroot() and since Open ends up having to reconnect, once again in
>>> mark_open_files_invalid(). I think it's the same lock because I don't
>>> see why the tcon pointers would be different in those 2 spots. Kernel
>>> mutexes are not reentrant so this is a deadlock.
>>
>> Is there anything we can do about this? Is this maybe already fixed in newer kernels?
>>
>> Regards, Martijn
>>
>>>
>>> Cheers,
>>>
>>
>> --
>> Martijn de Gouw
>> Designer
>> Prodrive Technologies
>> Mobile: +31 63 17 76 161
>> Phone:  +31 40 26 76 200

-- 
Martijn de Gouw
Designer
Prodrive Technologies
Mobile: +31 63 17 76 161
Phone:  +31 40 26 76 200 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Many processes end up in uninterruptible sleep accessing cifs mounts
  2019-07-06 11:11           ` Martijn de Gouw
@ 2019-07-06 13:05             ` Steve French
  2019-07-06 16:40               ` Pavel Shilovsky
  2019-07-08  9:04               ` Martijn de Gouw
  0 siblings, 2 replies; 16+ messages in thread
From: Steve French @ 2019-07-06 13:05 UTC (permalink / raw)
  To: Martijn de Gouw; +Cc: Pavel Shilovsky, Aurélien Aptel, linux-cifs

Are we sure this isn't something really simple like the server IP
address changing?  Remember in 5.0 kernel (also updated cifs-utils)
the code to allow failover when server's IP address changes was added.

On Sat, Jul 6, 2019 at 6:12 AM Martijn de Gouw
<martijn.de.gouw@prodrive-technologies.com> wrote:
>
> Hi Pavel,
>
> On 04-07-2019 23:34, Pavel Shilovsky wrote:
> > Hi Martijn,
> >
> > Thanks for reporting the problem. Have you tried v5.1.5 kernel and
> > above? That one has many reconnect fixes. If yes, could you please
> > provide the kernel stack / panic traces when running new versions of
> > the kernel?
>
> I can try to run them for a while on a less critical server. But even if that machine runs fine, there is not really a guarantee the cifs issue we have right now is solved for our critical machines, causing me to be very hesitated to upgrade those.
>
> Btw, here is another kernel dump, generated today. Please note that on this machine the cifs mounts are not really used, all 'action'
> performed on these mounts is the check if they are still mounted (by puppet).
>
>
> [36247.367572] INFO: task node_exporter:3567 blocked for more than 120 seconds.
> [36247.367613]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
> [36247.367630] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [36247.367648] node_exporter   D    0  3567      1 0x00000000
> [36247.367650] Call Trace:
> [36247.367668]  ? __schedule+0x338/0x850
> [36247.367671]  ? preempt_count_add+0x67/0xb0
> [36247.367673]  schedule+0x3c/0x90
> [36247.367675]  schedule_preempt_disabled+0x14/0x20
> [36247.367676]  __mutex_lock.isra.7+0x19f/0x540
> [36247.367680]  ? try_module_get+0x68/0x100
> [36247.367792]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
> [36247.367803]  smb2_reconnect+0x1d7/0x4b0 [cifs]
> [36247.367824]  smb2_plain_req_init+0x30/0x240 [cifs]
> [36247.367826]  ? __switch_to_asm+0x34/0x70
> [36247.367827]  ? __switch_to_asm+0x40/0x70
> [36247.367836]  SMB2_open_init+0x6d/0x7c0 [cifs]
> [36247.367846]  ? smb2_queryfs+0x13d/0x350 [cifs]
> [36247.367848]  ? lookup_fast+0xc8/0x2e0
> [36247.367857]  smb2_queryfs+0x13d/0x350 [cifs]
> [36247.367861]  ? lookup_fast+0xc8/0x2e0
> [36247.367863]  ? ___cache_free+0x31/0x2e0
> [36247.367865]  ? terminate_walk+0x93/0x100
> [36247.367873]  ? cifs_statfs+0xad/0x300 [cifs]
> [36247.367880]  cifs_statfs+0xad/0x300 [cifs]
> [36247.367894]  statfs_by_dentry+0x6a/0x90
> [36247.367895]  vfs_statfs+0x16/0xc0
> [36247.367897]  user_statfs+0x50/0xa0
> [36247.367898]  __do_sys_statfs+0x20/0x50
> [36247.367902]  do_syscall_64+0x55/0x100
> [36247.367903]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [36247.367905] RIP: 0033:0x4a5c20
> [36247.367936] Code: Bad RIP value.
> [36247.367937] RSP: 002b:000000c0003ef490 EFLAGS: 00000206 ORIG_RAX: 0000000000000089
> [36247.367938] RAX: ffffffffffffffda RBX: 000000c00002a000 RCX: 00000000004a5c20
> [36247.367939] RDX: 0000000000000000 RSI: 000000c0003ef5c0 RDI: 000000c0004510e0
> [36247.367940] RBP: 000000c0003ef4f0 R08: 0000000000000000 R09: 0000000000000000
> [36247.367940] R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff
> [36247.367941] R13: 0000000000000088 R14: 0000000000000087 R15: 0000000000000100
> [36247.367954] INFO: task kworker/1:2:32265 blocked for more than 120 seconds.
> [36247.367973]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
> [36247.367989] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [36247.368008] kworker/1:2     D    0 32265      2 0x80000000
> [36247.368020] Workqueue: cifsiod smb2_reconnect_server [cifs]
> [36247.368021] Call Trace:
> [36247.368023]  ? __schedule+0x338/0x850
> [36247.368025]  schedule+0x3c/0x90
> [36247.368026]  schedule_preempt_disabled+0x14/0x20
> [36247.368028]  __mutex_lock.isra.7+0x19f/0x540
> [36247.368037]  ? SMB2_sess_auth_rawntlmssp_authenticate+0x69/0x170 [cifs]
> [36247.368046]  ? SMB2_sess_setup+0x17f/0x2c0 [cifs]
> [36247.368054]  ? cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
> [36247.368055]  ? _raw_spin_unlock+0x16/0x30
> [36247.368062]  cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
> [36247.368071]  smb2_reconnect+0x2d6/0x4b0 [cifs]
> [36247.368073]  ? _raw_spin_unlock+0x16/0x30
> [36247.368074]  ? preempt_count_add+0x67/0xb0
> [36247.368083]  smb2_reconnect_server+0x1bb/0x350 [cifs]
> [36247.368090]  process_one_work+0x188/0x3a0
> [36247.368092]  worker_thread+0x4c/0x3a0
> [36247.368094]  ? preempt_count_add+0x67/0xb0
> [36247.368095]  ? process_one_work+0x3a0/0x3a0
> [36247.368097]  kthread+0xf8/0x130
> [36247.368099]  ? kthread_create_worker_on_cpu+0x70/0x70
> [36247.368101]  ret_from_fork+0x35/0x40
> [36247.368103] INFO: task puppet:3836 blocked for more than 120 seconds.
> [36247.368119]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
> [36247.368135] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [36247.368158] puppet          D    0  3836    541 0x00000000
> [36247.368159] Call Trace:
> [36247.368161]  ? __schedule+0x338/0x850
> [36247.368163]  schedule+0x3c/0x90
> [36247.368164]  schedule_preempt_disabled+0x14/0x20
> [36247.368165]  __mutex_lock.isra.7+0x19f/0x540
> [36247.368166]  ? __switch_to_asm+0x34/0x70
> [36247.368167]  ? __switch_to_asm+0x40/0x70
> [36247.368169]  ? try_module_get+0x68/0x100
> [36247.368178]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
> [36247.368186]  smb2_reconnect+0x1d7/0x4b0 [cifs]
> [36247.368188]  ? _raw_spin_unlock_irq+0x1d/0x30
> [36247.368189]  ? finish_task_switch+0x7d/0x290
> [36247.368190]  ? __switch_to_asm+0x40/0x70
> [36247.368198]  smb2_plain_req_init+0x30/0x240 [cifs]
> [36247.368199]  ? _raw_spin_lock_irqsave+0x25/0x50
> [36247.368208]  SMB2_open_init+0x6d/0x7c0 [cifs]
> [36247.368209]  ? _raw_spin_lock+0x13/0x30
> [36247.368218]  ? cifs_mid_q_entry_release+0x15/0x40 [cifs]
> [36247.368227]  ? SMB2_open+0x150/0x520 [cifs]
> [36247.368235]  SMB2_open+0x150/0x520 [cifs]
> [36247.368238]  ? sched_clock+0x5/0x10
> [36247.368247]  ? open_shroot+0x12f/0x200 [cifs]
> [36247.368255]  open_shroot+0x12f/0x200 [cifs]
> [36247.368258]  ? __follow_mount_rcu.isra.31+0x3c/0xf0
> [36247.368267]  smb2_query_path_info+0x93/0x220 [cifs]
> [36247.368269]  ? walk_component+0x48/0x360
> [36247.368278]  cifs_get_inode_info+0x580/0xb10 [cifs]
> [36247.368286]  ? filename_lookup+0xf8/0x1a0
> [36247.368294]  ? build_path_from_dentry_optional_prefix+0x1e9/0x440 [cifs]
> [36247.368303]  cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
> [36247.368311]  cifs_getattr+0x5b/0x1b0 [cifs]
> [36247.368313]  vfs_statx+0x89/0xe0
> [36247.368315]  __do_sys_newlstat+0x39/0x70
> [36247.368318]  do_syscall_64+0x55/0x100
> [36247.368320]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [36247.368321] RIP: 0033:0x7f562d3a0335
> [36247.368323] Code: Bad RIP value.
> [36247.368324] RSP: 002b:00007ffc6bbc4398 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
> [36247.368325] RAX: ffffffffffffffda RBX: 00007ffc6bbc43b0 RCX: 00007f562d3a0335
> [36247.368326] RDX: 00007ffc6bbc43b0 RSI: 00007ffc6bbc43b0 RDI: 000055e4e5e1bd50
> [36247.368326] RBP: 000055e4e2eb65e0 R08: 0000000000000002 R09: 000055e4e2eb69e0
> [36247.368327] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
> [36247.368328] R13: 000055e4e2ef7258 R14: 000055e4e2ef8810 R15: 000055e4e2f92ad0
> [36368.186922] INFO: task node_exporter:3567 blocked for more than 120 seconds.
> [36368.186988]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
> [36368.187039] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [36368.187097] node_exporter   D    0  3567      1 0x00000000
> [36368.187102] Call Trace:
> [36368.187117]  ? __schedule+0x338/0x850
> [36368.187123]  ? preempt_count_add+0x67/0xb0
> [36368.187128]  schedule+0x3c/0x90
> [36368.187133]  schedule_preempt_disabled+0x14/0x20
> [36368.187138]  __mutex_lock.isra.7+0x19f/0x540
> [36368.187175]  ? try_module_get+0x68/0x100
> [36368.187232]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
> [36368.187265]  smb2_reconnect+0x1d7/0x4b0 [cifs]
> [36368.187300]  smb2_plain_req_init+0x30/0x240 [cifs]
> [36368.187304]  ? __switch_to_asm+0x34/0x70
> [36368.187307]  ? __switch_to_asm+0x40/0x70
> [36368.187337]  SMB2_open_init+0x6d/0x7c0 [cifs]
> [36368.187372]  ? smb2_queryfs+0x13d/0x350 [cifs]
> [36368.187375]  ? lookup_fast+0xc8/0x2e0
> [36368.187405]  smb2_queryfs+0x13d/0x350 [cifs]
> [36368.187410]  ? lookup_fast+0xc8/0x2e0
> [36368.187416]  ? ___cache_free+0x31/0x2e0
> [36368.187421]  ? terminate_walk+0x93/0x100
> [36368.187449]  ? cifs_statfs+0xad/0x300 [cifs]
> [36368.187471]  cifs_statfs+0xad/0x300 [cifs]
> [36368.187477]  statfs_by_dentry+0x6a/0x90
> [36368.187482]  vfs_statfs+0x16/0xc0
> [36368.187486]  user_statfs+0x50/0xa0
> [36368.187491]  __do_sys_statfs+0x20/0x50
> [36368.187499]  do_syscall_64+0x55/0x100
> [36368.187503]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [36368.187507] RIP: 0033:0x4a5c20
> [36368.187517] Code: Bad RIP value.
> [36368.187519] RSP: 002b:000000c0003ef490 EFLAGS: 00000206 ORIG_RAX: 0000000000000089
> [36368.187523] RAX: ffffffffffffffda RBX: 000000c00002a000 RCX: 00000000004a5c20
> [36368.187526] RDX: 0000000000000000 RSI: 000000c0003ef5c0 RDI: 000000c0004510e0
> [36368.187528] RBP: 000000c0003ef4f0 R08: 0000000000000000 R09: 0000000000000000
> [36368.187530] R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff
> [36368.187532] R13: 0000000000000088 R14: 0000000000000087 R15: 0000000000000100
> [36368.187572] INFO: task kworker/1:2:32265 blocked for more than 120 seconds.
> [36368.187627]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
> [36368.187677] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [36368.187735] kworker/1:2     D    0 32265      2 0x80000000
> [36368.187771] Workqueue: cifsiod smb2_reconnect_server [cifs]
> [36368.187773] Call Trace:
> [36368.187780]  ? __schedule+0x338/0x850
> [36368.187785]  schedule+0x3c/0x90
> [36368.187790]  schedule_preempt_disabled+0x14/0x20
> [36368.187795]  __mutex_lock.isra.7+0x19f/0x540
> [36368.187826]  ? SMB2_sess_auth_rawntlmssp_authenticate+0x69/0x170 [cifs]
> [36368.187855]  ? SMB2_sess_setup+0x17f/0x2c0 [cifs]
> [36368.187880]  ? cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
> [36368.187885]  ? _raw_spin_unlock+0x16/0x30
> [36368.187908]  cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
> [36368.187936]  smb2_reconnect+0x2d6/0x4b0 [cifs]
> [36368.187942]  ? _raw_spin_unlock+0x16/0x30
> [36368.187947]  ? preempt_count_add+0x67/0xb0
> [36368.187976]  smb2_reconnect_server+0x1bb/0x350 [cifs]
> [36368.187984]  process_one_work+0x188/0x3a0
> [36368.187990]  worker_thread+0x4c/0x3a0
> [36368.187994]  ? preempt_count_add+0x67/0xb0
> [36368.187998]  ? process_one_work+0x3a0/0x3a0
> [36368.188001]  kthread+0xf8/0x130
> [36368.188005]  ? kthread_create_worker_on_cpu+0x70/0x70
> [36368.188009]  ret_from_fork+0x35/0x40
> [36368.188014] INFO: task puppet:3836 blocked for more than 120 seconds.
> [36368.188064]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
> [36368.188114] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [36368.188182] puppet          D    0  3836    541 0x00000000
> [36368.188186] Call Trace:
> [36368.188191]  ? __schedule+0x338/0x850
> [36368.188196]  schedule+0x3c/0x90
> [36368.188200]  schedule_preempt_disabled+0x14/0x20
> [36368.188205]  __mutex_lock.isra.7+0x19f/0x540
> [36368.188208]  ? __switch_to_asm+0x34/0x70
> [36368.188211]  ? __switch_to_asm+0x40/0x70
> [36368.188215]  ? try_module_get+0x68/0x100
> [36368.188244]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
> [36368.188271]  smb2_reconnect+0x1d7/0x4b0 [cifs]
> [36368.188277]  ? _raw_spin_unlock_irq+0x1d/0x30
> [36368.188280]  ? finish_task_switch+0x7d/0x290
> [36368.188283]  ? __switch_to_asm+0x40/0x70
> [36368.188310]  smb2_plain_req_init+0x30/0x240 [cifs]
> [36368.188314]  ? _raw_spin_lock_irqsave+0x25/0x50
> [36368.188342]  SMB2_open_init+0x6d/0x7c0 [cifs]
> [36368.188346]  ? _raw_spin_lock+0x13/0x30
> [36368.188376]  ? cifs_mid_q_entry_release+0x15/0x40 [cifs]
> [36368.188405]  ? SMB2_open+0x150/0x520 [cifs]
> [36368.188432]  SMB2_open+0x150/0x520 [cifs]
> [36368.188440]  ? sched_clock+0x5/0x10
> [36368.188470]  ? open_shroot+0x12f/0x200 [cifs]
> [36368.188497]  open_shroot+0x12f/0x200 [cifs]
> [36368.188503]  ? __follow_mount_rcu.isra.31+0x3c/0xf0
> [36368.188534]  smb2_query_path_info+0x93/0x220 [cifs]
> [36368.188539]  ? walk_component+0x48/0x360
> [36368.188569]  cifs_get_inode_info+0x580/0xb10 [cifs]
> [36368.188574]  ? filename_lookup+0xf8/0x1a0
> [36368.188601]  ? build_path_from_dentry_optional_prefix+0x1e9/0x440 [cifs]
> [36368.188630]  cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
> [36368.188659]  cifs_getattr+0x5b/0x1b0 [cifs]
> [36368.188664]  vfs_statx+0x89/0xe0
> [36368.188670]  __do_sys_newlstat+0x39/0x70
> [36368.188677]  do_syscall_64+0x55/0x100
> [36368.188681]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [36368.188684] RIP: 0033:0x7f562d3a0335
> [36368.188689] Code: Bad RIP value.
> [36368.188691] RSP: 002b:00007ffc6bbc4398 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
> [36368.188695] RAX: ffffffffffffffda RBX: 00007ffc6bbc43b0 RCX: 00007f562d3a0335
> [36368.188696] RDX: 00007ffc6bbc43b0 RSI: 00007ffc6bbc43b0 RDI: 000055e4e5e1bd50
> [36368.188698] RBP: 000055e4e2eb65e0 R08: 0000000000000002 R09: 000055e4e2eb69e0
> [36368.188700] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
> [36368.188702] R13: 000055e4e2ef7258 R14: 000055e4e2ef8810 R15: 000055e4e2f92ad0
> [36489.006262] INFO: task node_exporter:3567 blocked for more than 120 seconds.
> [36489.006300]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
> [36489.006329] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [36489.006458] node_exporter   D    0  3567      1 0x00000000
> [36489.006461] Call Trace:
> [36489.006470]  ? __schedule+0x338/0x850
> [36489.006474]  ? preempt_count_add+0x67/0xb0
> [36489.006476]  schedule+0x3c/0x90
> [36489.006478]  schedule_preempt_disabled+0x14/0x20
> [36489.006480]  __mutex_lock.isra.7+0x19f/0x540
> [36489.006485]  ? try_module_get+0x68/0x100
> [36489.006515]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
> [36489.006531]  smb2_reconnect+0x1d7/0x4b0 [cifs]
> [36489.006547]  smb2_plain_req_init+0x30/0x240 [cifs]
> [36489.006549]  ? __switch_to_asm+0x34/0x70
> [36489.006550]  ? __switch_to_asm+0x40/0x70
> [36489.006565]  SMB2_open_init+0x6d/0x7c0 [cifs]
> [36489.006581]  ? smb2_queryfs+0x13d/0x350 [cifs]
> [36489.006583]  ? lookup_fast+0xc8/0x2e0
> [36489.006597]  smb2_queryfs+0x13d/0x350 [cifs]
> [36489.006599]  ? lookup_fast+0xc8/0x2e0
> [36489.006602]  ? ___cache_free+0x31/0x2e0
> [36489.006604]  ? terminate_walk+0x93/0x100
> [36489.006619]  ? cifs_statfs+0xad/0x300 [cifs]
> [36489.006632]  cifs_statfs+0xad/0x300 [cifs]
> [36489.006635]  statfs_by_dentry+0x6a/0x90
> [36489.006637]  vfs_statfs+0x16/0xc0
> [36489.006639]  user_statfs+0x50/0xa0
> [36489.006641]  __do_sys_statfs+0x20/0x50
> [36489.006645]  do_syscall_64+0x55/0x100
> [36489.006647]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [36489.006649] RIP: 0033:0x4a5c20
> [36489.006654] Code: Bad RIP value.
> [36489.006655] RSP: 002b:000000c0003ef490 EFLAGS: 00000206 ORIG_RAX: 0000000000000089
> [36489.006657] RAX: ffffffffffffffda RBX: 000000c00002a000 RCX: 00000000004a5c20
> [36489.006658] RDX: 0000000000000000 RSI: 000000c0003ef5c0 RDI: 000000c0004510e0
> [36489.006659] RBP: 000000c0003ef4f0 R08: 0000000000000000 R09: 0000000000000000
> [36489.006660] R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff
> [36489.006661] R13: 0000000000000088 R14: 0000000000000087 R15: 0000000000000100
> [36489.006675] INFO: task kworker/1:2:32265 blocked for more than 120 seconds.
> [36489.006812]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
> [36489.006940] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [36489.007076] kworker/1:2     D    0 32265      2 0x80000000
> [36489.007100] Workqueue: cifsiod smb2_reconnect_server [cifs]
> [36489.007101] Call Trace:
> [36489.007105]  ? __schedule+0x338/0x850
> [36489.007107]  schedule+0x3c/0x90
> [36489.007109]  schedule_preempt_disabled+0x14/0x20
> [36489.007111]  __mutex_lock.isra.7+0x19f/0x540
> [36489.007124]  ? SMB2_sess_auth_rawntlmssp_authenticate+0x69/0x170 [cifs]
> [36489.007134]  ? SMB2_sess_setup+0x17f/0x2c0 [cifs]
> [36489.007141]  ? cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
> [36489.007143]  ? _raw_spin_unlock+0x16/0x30
> [36489.007150]  cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
> [36489.007159]  smb2_reconnect+0x2d6/0x4b0 [cifs]
> [36489.007161]  ? _raw_spin_unlock+0x16/0x30
> [36489.007163]  ? preempt_count_add+0x67/0xb0
> [36489.007172]  smb2_reconnect_server+0x1bb/0x350 [cifs]
> [36489.007176]  process_one_work+0x188/0x3a0
> [36489.007178]  worker_thread+0x4c/0x3a0
> [36489.007179]  ? preempt_count_add+0x67/0xb0
> [36489.007180]  ? process_one_work+0x3a0/0x3a0
> [36489.007181]  kthread+0xf8/0x130
> [36489.007183]  ? kthread_create_worker_on_cpu+0x70/0x70
> [36489.007184]  ret_from_fork+0x35/0x40
> [36489.007186] INFO: task puppet:3836 blocked for more than 120 seconds.
> [36489.007294]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
> [36489.007409] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [36489.007540] puppet          D    0  3836    541 0x00000000
> [36489.007541] Call Trace:
> [36489.007545]  ? __schedule+0x338/0x850
> [36489.007548]  schedule+0x3c/0x90
> [36489.007550]  schedule_preempt_disabled+0x14/0x20
> [36489.007552]  __mutex_lock.isra.7+0x19f/0x540
> [36489.007553]  ? __switch_to_asm+0x34/0x70
> [36489.007555]  ? __switch_to_asm+0x40/0x70
> [36489.007558]  ? try_module_get+0x68/0x100
> [36489.007580]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
> [36489.007594]  smb2_reconnect+0x1d7/0x4b0 [cifs]
> [36489.007596]  ? _raw_spin_unlock_irq+0x1d/0x30
> [36489.007598]  ? finish_task_switch+0x7d/0x290
> [36489.007600]  ? __switch_to_asm+0x40/0x70
> [36489.007613]  smb2_plain_req_init+0x30/0x240 [cifs]
> [36489.007615]  ? _raw_spin_lock_irqsave+0x25/0x50
> [36489.007628]  SMB2_open_init+0x6d/0x7c0 [cifs]
> [36489.007630]  ? _raw_spin_lock+0x13/0x30
> [36489.007644]  ? cifs_mid_q_entry_release+0x15/0x40 [cifs]
> [36489.007658]  ? SMB2_open+0x150/0x520 [cifs]
> [36489.007672]  SMB2_open+0x150/0x520 [cifs]
> [36489.007677]  ? sched_clock+0x5/0x10
> [36489.007692]  ? open_shroot+0x12f/0x200 [cifs]
> [36489.007707]  open_shroot+0x12f/0x200 [cifs]
> [36489.007711]  ? __follow_mount_rcu.isra.31+0x3c/0xf0
> [36489.007727]  smb2_query_path_info+0x93/0x220 [cifs]
> [36489.007729]  ? walk_component+0x48/0x360
> [36489.007745]  cifs_get_inode_info+0x580/0xb10 [cifs]
> [36489.007748]  ? filename_lookup+0xf8/0x1a0
> [36489.007762]  ? build_path_from_dentry_optional_prefix+0x1e9/0x440 [cifs]
> [36489.007778]  cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
> [36489.007791]  cifs_getattr+0x5b/0x1b0 [cifs]
> [36489.007794]  vfs_statx+0x89/0xe0
> [36489.007797]  __do_sys_newlstat+0x39/0x70
> [36489.007801]  do_syscall_64+0x55/0x100
> [36489.007802]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [36489.007805] RIP: 0033:0x7f562d3a0335
> [36489.007808] Code: Bad RIP value.
> [36489.007809] RSP: 002b:00007ffc6bbc4398 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
> [36489.007811] RAX: ffffffffffffffda RBX: 00007ffc6bbc43b0 RCX: 00007f562d3a0335
> [36489.007812] RDX: 00007ffc6bbc43b0 RSI: 00007ffc6bbc43b0 RDI: 000055e4e5e1bd50
> [36489.007813] RBP: 000055e4e2eb65e0 R08: 0000000000000002 R09: 000055e4e2eb69e0
> [36489.007814] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
> [36489.007815] R13: 000055e4e2ef7258 R14: 000055e4e2ef8810 R15: 000055e4e2f92ad0
> [36609.825705] INFO: task node_exporter:3567 blocked for more than 120 seconds.
> [36609.825836]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
> [36609.825869] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [36609.825908] node_exporter   D    0  3567      1 0x00000000
> [36609.825910] Call Trace:
> [36609.825919]  ? __schedule+0x338/0x850
> [36609.825922]  ? preempt_count_add+0x67/0xb0
> [36609.825925]  schedule+0x3c/0x90
> [36609.825927]  schedule_preempt_disabled+0x14/0x20
> [36609.825929]  __mutex_lock.isra.7+0x19f/0x540
> [36609.825933]  ? try_module_get+0x68/0x100
> [36609.825960]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
> [36609.825979]  smb2_reconnect+0x1d7/0x4b0 [cifs]
> [36609.825999]  smb2_plain_req_init+0x30/0x240 [cifs]
> [36609.826001]  ? __switch_to_asm+0x34/0x70
> [36609.826002]  ? __switch_to_asm+0x40/0x70
> [36609.826020]  SMB2_open_init+0x6d/0x7c0 [cifs]
> [36609.826039]  ? smb2_queryfs+0x13d/0x350 [cifs]
> [36609.826040]  ? lookup_fast+0xc8/0x2e0
> [36609.826057]  smb2_queryfs+0x13d/0x350 [cifs]
> [36609.826060]  ? lookup_fast+0xc8/0x2e0
> [36609.826064]  ? ___cache_free+0x31/0x2e0
> [36609.826066]  ? terminate_walk+0x93/0x100
> [36609.826084]  ? cifs_statfs+0xad/0x300 [cifs]
> [36609.826099]  cifs_statfs+0xad/0x300 [cifs]
> [36609.826103]  statfs_by_dentry+0x6a/0x90
> [36609.826104]  vfs_statfs+0x16/0xc0
> [36609.826106]  user_statfs+0x50/0xa0
> [36609.826108]  __do_sys_statfs+0x20/0x50
> [36609.826113]  do_syscall_64+0x55/0x100
> [36609.826115]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [36609.826117] RIP: 0033:0x4a5c20
> [36609.826123] Code: Bad RIP value.
> [36609.826124] RSP: 002b:000000c0003ef490 EFLAGS: 00000206 ORIG_RAX: 0000000000000089
> [36609.826126] RAX: ffffffffffffffda RBX: 000000c00002a000 RCX: 00000000004a5c20
> [36609.826127] RDX: 0000000000000000 RSI: 000000c0003ef5c0 RDI: 000000c0004510e0
> [36609.826128] RBP: 000000c0003ef4f0 R08: 0000000000000000 R09: 0000000000000000
> [36609.826129] R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff
> [36609.826130] R13: 0000000000000088 R14: 0000000000000087 R15: 0000000000000100
> [73117.654990] CIFS VFS: Send error in SessSetup = -126
>
> Regards, Martijn
>
> >
> > --
> > Best regards,
> > Pavel Shilovsky
> >
> > чт, 4 июл. 2019 г. в 06:36, Martijn de Gouw
> > <martijn.de.gouw@prodrive-technologies.com>:
> >>
> >> Hi,
> >>
> >> On 04-07-2019 13:22, Aurélien Aptel wrote:
> >>> Martijn de Gouw <martijn.de.gouw@prodrive-technologies.com> writes:
> >>>>> Are there any kernel oops/panic with stack traces and register dumps in
> >>>>> the log?
> >>>>>
> >>>>> You can inspect the kernel stack trace of the hung processes (to see where
> >>>>> they are stuck) by printing the file /proc/<pid>/stack.
> >>>>
> >>>> These are the stacks of all processes that are D, most of them being df.
> >>>> I also attached the cifs Stats output below.
> >>>
> >>> Ok thanks. What about Oops or BUG or panic in dmesg logs, did you see
> >>> any?
> >>
> >> I did find the following messages in the dmesg of one of the servers:
> >>
> >> [    4.797893] FS-Cache: Duplicate cookie detected
> >> [    4.797915] FS-Cache: O-cookie c=000000001a791554 [p=00000000fb6f31ee fl=222 nc=0 na=1]
> >> [    4.797934] FS-Cache: O-cookie d=0000000020a06fab n=00000000654600e7
> >> [    4.797949] FS-Cache: O-key=[8] '020001bd0a010102'
> >> [    4.797963] FS-Cache: N-cookie c=000000005d0bf4a5 [p=00000000fb6f31ee fl=2 nc=0 na=1]
> >> [    4.797982] FS-Cache: N-cookie d=0000000020a06fab n=000000004e1e47aa
> >> [    4.797997] FS-Cache: N-key=[8] '020001bd0a010102'
> >> [    4.798643] FS-Cache: Duplicate cookie detected
> >> [    4.798659] FS-Cache: O-cookie c=000000001a791554 [p=00000000fb6f31ee fl=222 nc=0 na=1]
> >> [    4.798679] FS-Cache: O-cookie d=0000000020a06fab n=00000000654600e7
> >> [    4.798695] FS-Cache: O-key=[8] '020001bd0a010102'
> >> [    4.798714] FS-Cache: N-cookie c=00000000cbe44971 [p=00000000fb6f31ee fl=2 nc=0 na=1]
> >> [    4.798733] FS-Cache: N-cookie d=0000000020a06fab n=00000000ab0e78a6
> >> [    4.798747] FS-Cache: N-key=[8] '020001bd0a010102'
> >> [    4.906667] FS-Cache: Netfs 'nfs' registered for caching
> >> [12738.729173] CIFS VFS: Send error in SessSetup = -126
> >> [99125.480751] CIFS VFS: Send error in SessSetup = -126
> >> [185517.295175] CIFS VFS: Send error in SessSetup = -126
> >> [250515.749714] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
> >> [250515.749740] BUG: unable to handle kernel paging request at ffff8ec52a6fe1d0
> >> [250515.749757] PGD 1b2602067 P4D 1b2602067 PUD 42dbff063 PMD 42a357063 PTE 800000042a6fe063
> >> [250515.749779] Oops: 0011 [#1] PREEMPT SMP PTI
> >> [250515.749792] CPU: 1 PID: 796 Comm: cifsd Tainted: G            E     4.20.17-pd-4.20.y #20190611
> >> [250515.749812] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/19/2018
> >> [250515.749844] RIP: 0010:0xffff8ec52a6fe1d0
> >> [250515.749860] Code: ff ff 00 00 00 00 fd 01 00 00 00 7d 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> e1 6f 2a c5 8e ff ff d0 e1 6f 2a c5 8e ff ff 80 70 82 2b c5 8e
> >> [250515.749914] RSP: 0018:ffff9e60c2e2bdc0 EFLAGS: 00010202
> >> [250515.749927] RAX: ffff8ec52a6fe1d0 RBX: ffff8ec526fce800 RCX: dead000000000200
> >> [250515.749944] RDX: ffff8ec52d3e6800 RSI: 0000000000000246 RDI: ffff8ec52a6fe198
> >> [250515.749962] RBP: ffff8ec52d3e6800 R08: 0000000000000002 R09: ffff8ec52a6fe170
> >> [250515.749979] R10: ffff9e60c2e2bc10 R11: ffff8ec527088000 R12: ffff8ec526fce9c0
> >> [250515.749997] R13: ffff8ec52d3e6800 R14: ffff9e60c2e2bdd8 R15: ffff8ec52d3e6b80
> >> [250515.750014] FS:  0000000000000000(0000) GS:ffff8ec52fa80000(0000) knlGS:0000000000000000
> >> [250515.750033] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [250515.750048] CR2: ffff8ec52a6fe1d0 CR3: 000000042a53c005 CR4: 00000000003606e0
> >> [250515.750100] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >> [250515.750102] CIFS VFS: No task to wake, unknown frame received! NumMids 3
> >> [250515.750119] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >> [250515.750120] Call Trace:
> >> [250515.750149] 00000000: 424d53fe 00010040 00000000 00000005  .SMB@...........
> >> [250515.750193]  ? cifs_reconnect+0x337/0x880 [cifs]
> >> [250515.750201]  ? cifs_handle_standard+0x169/0x190 [cifs]
> >> [250515.750216] 00000010: 10000009 00000098 00000db2 00000000  ................
> >> [250515.750234]  ? cifs_demultiplex_thread+0x9e5/0xbc0 [cifs]
> >> [250515.750240] 00000020: 0000025c 00000001 640002e9 0000c81d  \..........d....
> >> [250515.750259]  ? finish_task_switch+0x7d/0x290
> >> [250515.750271] 00000030: 5a9e63f3 6204a9b0 587058e3 0d45419b  .c.Z...b.XpX.AE.
> >> [250515.750295]  ? cifs_handle_standard+0x190/0x190 [cifs]
> >> [250515.750728] CIFS VFS: No task to wake, unknown frame received! NumMids 3
> >> [250515.751200]  ? kthread+0xf8/0x130
> >> [250515.751639] 00000000: 424d53fe 00010040 00000000 00000010  .SMB@...........
> >> [250515.752102]  ? kthread_create_worker_on_cpu+0x70/0x70
> >> [250515.752554] 00000010: 0000000d 00000068 00000db3 00000000  ....h...........
> >> [250515.752997]  ? ret_from_fork+0x35/0x40
> >> [250515.753429] 00000020: 0000025c 00000001 640002e9 0000c81d  \..........d....
> >> [250515.753874] Modules linked in: cpufreq_userspace(E) cpufreq_powersave(E) cpufreq_conservative(E) arc4(E) ecb(E) md4(E) nfsv3(E) nfs_acl(E) nfs(E) sha512_ssse3(E) sha512_generic(E) lockd(E) cmac(E) grace(E) hmac(E) nls_utf8(E) cifs(E) ccm(E) dns_resolver(E) fscache(E) vmw_vsock_vmci_transport(E) vsock(E) sb_edac(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) vmw_balloon(E) aes_x86_64(E) joydev(E) evdev(E) crypto_simd(E) vmwgfx(E) serio_raw(E) cryptd(E) glue_helper(E) ttm(E) sg(E) vmw_vmci(E) drm_kms_helper(E) drm(E) button(E) ac(E) auth_rpcgss(E) sunrpc(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) ata_generic(E) sd_mod(E) ata_piix(E) crc32c_intel(E) psmouse(E) libata(E) vmw_pvscsi(E) vmxnet3(E) scsi_mod(E) i2c_piix4(E)
> >> [250515.754336] 00000030: 49d4fd21 858665a2 fde5288f 01d2d919  !..I.e...(......
> >> [250515.754766] CR2: ffff8ec52a6fe1d0
> >> [250515.758927] CIFS VFS: No task to wake, unknown frame received! NumMids 3
> >> [250515.759389] ---[ end trace 92ea62cd080150de ]---
> >> [250515.759879] 00000000: 424d53fe 00010040 00000000 00030006  .SMB@...........
> >> [250515.760357] RIP: 0010:0xffff8ec52a6fe1d0
> >> [250515.761841] Code: ff ff 00 00 00 00 fd 01 00 00 00 7d 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> e1 6f 2a c5 8e ff ff d0 e1 6f 2a c5 8e ff ff 80 70 82 2b c5 8e
> >> [250515.763287] RSP: 0018:ffff9e60c2e2bdc0 EFLAGS: 00010202
> >> [250515.763778] RAX: ffff8ec52a6fe1d0 RBX: ffff8ec526fce800 RCX: dead000000000200
> >> [250515.764060] 00000010: 0000000d 00000000 00000db4 00000000  ................
> >> [250515.764310] RDX: ffff8ec52d3e6800 RSI: 0000000000000246 RDI: ffff8ec52a6fe198
> >> [250515.764311] RBP: ffff8ec52d3e6800 R08: 0000000000000002 R09: ffff8ec52a6fe170
> >> [250515.764312] R10: ffff9e60c2e2bc10 R11: ffff8ec527088000 R12: ffff8ec526fce9c0
> >> [250515.764314] R13: ffff8ec52d3e6800 R14: ffff9e60c2e2bdd8 R15: ffff8ec52d3e6b80
> >> [250515.765491] 00000020: 0000025c 00000001 640002e9 0000c81d  \..........d....
> >> [250515.765836] FS:  0000000000000000(0000) GS:ffff8ec52fa80000(0000) knlGS:0000000000000000
> >> [250515.767899] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [250515.768367] CR2: ffff8ec52a6fe1d0 CR3: 000000042a53c005 CR4: 00000000003606e0
> >> [250515.768919] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >> [250515.769419] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >> [250515.769591] 00000030: 879543e1 572c0574 e53861ba 97ccc40b  .C..t.,W.a8.....
> >> [271909.743603] CIFS VFS: Send error in SessSetup = -126
> >> [358301.908585] CIFS VFS: Send error in SessSetup = -126
> >> [444693.566453] CIFS VFS: Send error in SessSetup = -126
> >> [531086.090040] CIFS VFS: Send error in SessSetup = -126
> >> [617477.785390] CIFS VFS: Send error in SessSetup = -126
> >> [703869.556705] CIFS VFS: Send error in SessSetup = -126
> >> [790261.656853] CIFS VFS: Send error in SessSetup = -126
> >> [876653.496928] CIFS VFS: Send error in SessSetup = -126
> >> [963045.816742] CIFS VFS: Send error in SessSetup = -126
> >> [1049437.566219] CIFS VFS: Send error in SessSetup = -126
> >>
> >>
> >> And from another server:
> >> [    4.253091] FS-Cache: Duplicate cookie detected
> >> [    4.253120] FS-Cache: O-cookie c=000000004cc29d26 [p=0000000017dbbbc0 fl=222 nc=0 na=1]
> >> [    4.253153] FS-Cache: O-cookie d=00000000e8e68765 n=0000000012869fa7
> >> [    4.253179] FS-Cache: O-key=[8] '020001bd0a010102'
> >> [    4.253201] FS-Cache: N-cookie c=00000000c3bbbddd [p=0000000017dbbbc0 fl=2 nc=0 na=1]
> >> [    4.253235] FS-Cache: N-cookie d=00000000e8e68765 n=00000000335882b3
> >> [    4.253262] FS-Cache: N-key=[8] '020001bd0a010102'
> >> [    4.254107] FS-Cache: Duplicate cookie detected
> >> [    4.254130] FS-Cache: O-cookie c=000000004cc29d26 [p=0000000017dbbbc0 fl=222 nc=0 na=1]
> >> [    4.254161] FS-Cache: O-cookie d=00000000e8e68765 n=0000000012869fa7
> >> [    4.254185] FS-Cache: O-key=[8] '020001bd0a010102'
> >> [    4.254230] FS-Cache: N-cookie c=000000000ec2f0bb [p=0000000017dbbbc0 fl=2 nc=0 na=1]
> >> [    4.254261] FS-Cache: N-cookie d=00000000e8e68765 n=0000000024706210
> >> [    4.254285] FS-Cache: N-key=[8] '020001bd0a010102'
> >> [    4.329147] CIFS VFS: BAD_NETWORK_NAME: \\stor02.bk.prodrive.nl\userdata$
> >> [    4.330107] CIFS VFS: cifs_mount failed w/return code = -2
> >> [65206.127542] CIFS VFS: Send error in SessSetup = -126
> >> [151597.808064] CIFS VFS: Send error in SessSetup = -126
> >> [237989.956447] CIFS VFS: Send error in SessSetup = -126
> >> [324380.937984] CIFS VFS: Send error in SessSetup = -126
> >> [402750.869518] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
> >> [402750.869594] BUG: unable to handle kernel paging request at ffff909bf61609d0
> >> [402750.869650] PGD 1ac02067 P4D 1ac02067 PUD 1385f5063 PMD 136142063 PTE 8000000136160063
> >> [402750.869716] Oops: 0011 [#1] PREEMPT SMP PTI
> >> [402750.869753] CPU: 0 PID: 797 Comm: cifsd Tainted: G            E     4.20.17-pd-4.20.y #20190611
> >> [402750.869818] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/19/2018
> >> [402750.869926] CIFS VFS: No task to wake, unknown frame received! NumMids 3
> >> [402750.869947] RIP: 0010:0xffff909bf61609d0
> >> [402750.870013] 00000000: 424d53fe 00010040 00000000 00000005  .SMB@...........
> >> [402750.870198] Code: ff ff 00 00 00 00 4e 01 00 00 00 7d 00 00 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> 09 16 f6 9b 90 ff ff d0 09 16 f6 9b 90 ff ff c0 40 1f f2 9b 90
> >> [402750.870275] 00000010: 10000009 00000098 000002b5 00000000  ................
> >> [402750.870450] RSP: 0000:ffffa18d41adbd28 EFLAGS: 00010202
> >> [402750.870469] 00000020: 000001cd 00000001 44000cd9 0000c824  ...........D$...
> >> [402750.870509] RAX: ffff909bf61609d0 RBX: ffff909bf6c3f800 RCX: dead000000000200
> >> [402750.870511] RDX: ffff909bf74fa980 RSI: 0000000000000246 RDI: ffff909bf6160998
> >> [402750.870530] 00000030: 79955e4a 0dacb8eb 025edebd 4efa7788  J^.y......^..w.N
> >> [402750.870582] RBP: ffff909bf74fa980 R08: 0000000000000000 R09: ffff909bf6c3f970
> >> [402750.870585] R10: ffffa18d406a7cc0 R11: 0000000000000000 R12: ffff909bf6c3f9c0
> >> [402750.870605] CIFS VFS: No task to wake, unknown frame received! NumMids 3
> >> [402750.870656] R13: ffff909bf74fa980 R14: ffffa18d41adbd40 R15: ffff909bf74fa100
> >> [402750.870660] FS:  0000000000000000(0000) GS:ffff909bf9c00000(0000) knlGS:0000000000000000
> >> [402750.870677] 00000000: 424d53fe 00010040 00000000 00000010  .SMB@...........
> >> [402750.870731] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [402750.870733] CR2: ffff909bf61609d0 CR3: 0000000136e56004 CR4: 00000000003606f0
> >> [402750.870752] 00000010: 0000000d 00000068 000002b6 00000000  ....h...........
> >> [402750.870888] 00000020: 000001cd 00000001 44000cd9 0000c824  ...........D$...
> >> [402750.870894] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >> [402750.870907] 00000030: f063942a cae9f6e1 d5327c26 91fc6f33  *.c.....&|2.3o..
> >> [402750.872321] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >> [402750.872324] Call Trace:
> >> [402750.872381]  ? cifs_reconnect+0x337/0x880 [cifs]
> >> [402750.872410]  ? cifs_readv_from_socket+0x211/0x260 [cifs]
> >> [402750.875687]  ? cifs_read_from_socket+0x4a/0x70 [cifs]
> >> [402750.876128]  ? _raw_spin_unlock_irqrestore+0x20/0x40
> >> [402750.876555]  ? try_to_wake_up+0x54/0x540
> >> [402750.876991]  ? cifs_small_buf_get+0x16/0x20 [cifs]
> >> [402750.877426]  ? cifs_demultiplex_thread+0xdd/0xbc0 [cifs]
> >> [402750.877834]  ? finish_task_switch+0x7d/0x290
> >> [402750.878252]  ? cifs_handle_standard+0x190/0x190 [cifs]
> >> [402750.878651]  ? kthread+0xf8/0x130
> >> [402750.879034]  ? kthread_create_worker_on_cpu+0x70/0x70
> >> [402750.879412]  ? ret_from_fork+0x35/0x40
> >> [402750.879786] Modules linked in: cpufreq_conservative(E) cpufreq_powersave(E) cpufreq_userspace(E) arc4(E) ecb(E) md4(E) sha512_ssse3(E) sha512_generic(E) cmac(E) hmac(E) nfsv3(E) nfs_acl(E) nls_utf8(E) nfs(E) lockd(E) grace(E) cifs(E) ccm(E) dns_resolver(E) fscache(E) sb_edac(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) aes_x86_64(E) crypto_simd(E) joydev(E) cryptd(E) vmw_balloon(E) evdev(E) glue_helper(E) serio_raw(E) vmwgfx(E) ttm(E) drm_kms_helper(E) sg(E) drm(E) vmw_vsock_vmci_transport(E) vsock(E) vmw_vmci(E) ac(E) button(E) auth_rpcgss(E) sunrpc(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sd_mod(E) ata_generic(E) ata_piix(E) vmw_pvscsi(E) libata(E) crc32c_intel(E) psmouse(E) vmxnet3(E) i2c_piix4(E) scsi_mod(E)
> >> [402750.883384] CR2: ffff909bf61609d0
> >> [402750.883783] ---[ end trace 08b06875e82513eb ]---
> >> [402750.884200] RIP: 0010:0xffff909bf61609d0
> >> [402750.884598] Code: ff ff 00 00 00 00 52 01 00 00 00 7d 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> 09 16 f6 9b 90 ff ff d0 09 16 f6 9b 90 ff ff c0 40 1f f2 9b 90
> >> [402750.885822] RSP: 0000:ffffa18d41adbd28 EFLAGS: 00010202
> >> [402750.886229] RAX: ffff909bf61609d0 RBX: ffff909bf6c3f800 RCX: dead000000000200
> >> [402750.886651] RDX: ffff909bf74fa980 RSI: 0000000000000246 RDI: ffff909bf6160998
> >> [402750.887065] RBP: ffff909bf74fa980 R08: 0000000000000000 R09: ffff909bf6c3f970
> >> [402750.887478] R10: ffffa18d406a7cc0 R11: 0000000000000000 R12: ffff909bf6c3f9c0
> >> [402750.887883] R13: ffff909bf74fa980 R14: ffffa18d41adbd40 R15: ffff909bf74fa100
> >> [402750.888285] FS:  0000000000000000(0000) GS:ffff909bf9c00000(0000) knlGS:0000000000000000
> >> [402750.888692] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [402750.889093] CR2: ffff909bf61609d0 CR3: 0000000136e56004 CR4: 00000000003606f0
> >> [402750.889559] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >> [402750.889972] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >> [410772.901264] CIFS VFS: Send error in SessSetup = -126
> >> [497165.194662] CIFS VFS: Send error in SessSetup = -126
> >> [583557.492304] CIFS VFS: Send error in SessSetup = -126
> >> [669948.937016] CIFS VFS: Send error in SessSetup = -126
> >> [756341.390112] CIFS VFS: Send error in SessSetup = -126
> >> [842733.557002] CIFS VFS: Send error in SessSetup = -126
> >> [929125.303428] CIFS VFS: Send error in SessSetup = -126
> >> [1015516.629380] CIFS VFS: Send error in SessSetup = -126
> >> [1101908.464697] CIFS VFS: Send error in SessSetup = -126
> >> [1188300.531261] CIFS VFS: Send error in SessSetup = -126
> >> [1274692.583049] CIFS VFS: Send error in SessSetup = -126
> >>
> >>
> >>>
> >>> The individual stack dumps are pretty useful. Here is my theory:
> >>>
> >>>> pid: 9505
> >>>> syscall: 4 0x56550a2ec470 0x7ffede42e9a0 0x7ffede42e9a0 0x83a 0x3 0x20
> >>>> 0x7ffede42e8f8 0x7f7f8928f295
> >>>> [<0>] open_shroot+0x43/0x200 [cifs]
> >>>> [<0>] smb2_query_path_info+0x93/0x220 [cifs]
> >>>
> >>> Almost all of the processes have the same stack trace. They are stuck at
> >>> open_shroot()+0x43 which is probably
> >>>
> >>>       mutex_lock(&tcon->crfid.fid_mutex);
> >>>
> >>> Then there are only 2 other processes stuck somewhere in the same code path
> >>> (open_shroot) but deeper, meaning they have the locks that the other
> >>> processes are waiting for:
> >>>
> >>>
> >>>> pid: 22858
> >>>> syscall: 4 0x564b46285d10 0x7ffcea3f9a80 0x7ffcea3f9a80 0x83a 0x3 0x20
> >>>> 0x7ffcea3f99d8 0x7f6cc78c7295
> >>>> [<0>] cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
> >>>> [<0>] smb2_reconnect+0x2d6/0x4b0 [cifs]
> >>>> [<0>] smb2_plain_req_init+0x30/0x240 [cifs]
> >>>> [<0>] SMB2_open_init+0x6d/0x7c0 [cifs]
> >>>> [<0>] SMB2_open+0x150/0x520 [cifs]
> >>>> [<0>] open_shroot+0x12f/0x200 [cifs]
> >>>> [<0>] smb2_query_path_info+0x93/0x220 [cifs]
> >>>> [<0>] cifs_get_inode_info+0x580/0xb10 [cifs]
> >>>> [<0>] cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
> >>>> [<0>] cifs_getattr+0x5b/0x1b0 [cifs]
> >>>> [<0>] vfs_statx+0x89/0xe0
> >>>> [<0>] __do_sys_newstat+0x39/0x70
> >>>> [<0>] do_syscall_64+0x55/0x100
> >>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>>> [<0>] 0xffffffffffffffff
> >>>
> >>>
> >>>> pid: 20027
> >>>> syscall: 4 0x55a3c7d767d0 0x7ffe51432ab0 0x7ffe51432ab0 0x83a
> >>>> 0x55a3c7d75c40 0x20 0x7ffe51432a08 0x7f5f7c4e7295
> >>>> [<0>] cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
> >>>> [<0>] smb2_reconnect+0x2d6/0x4b0 [cifs]
> >>>> [<0>] smb2_plain_req_init+0x30/0x240 [cifs]
> >>>> [<0>] SMB2_open_init+0x6d/0x7c0 [cifs]
> >>>> [<0>] SMB2_open+0x150/0x520 [cifs]
> >>>> [<0>] open_shroot+0x12f/0x200 [cifs]
> >>>> [<0>] smb2_query_path_info+0x93/0x220 [cifs]
> >>>> [<0>] cifs_get_inode_info+0x580/0xb10 [cifs]
> >>>> [<0>] cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
> >>>> [<0>] cifs_getattr+0x5b/0x1b0 [cifs]
> >>>> [<0>] vfs_statx+0x89/0xe0
> >>>> [<0>] __do_sys_newstat+0x39/0x70
> >>>> [<0>] do_syscall_64+0x55/0x100
> >>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>>> [<0>] 0xffffffffffffffff
> >>>
> >>> Due to timeouts maybe the Open request needs
> >>> to reconnect the server/ses/tcon and to do this it calls
> >>> cifs_mark_open_files_invalid() and gets stuck somewhere there.
> >>>
> >>>           spin_lock(&tcon->open_file_lock);
> >>>           list_for_each_safe(tmp, tmp1, &tcon->openFileList) {
> >>>                   open_file = list_entry(tmp, struct cifsFileInfo, tlist);
> >>>                   open_file->invalidHandle = true;
> >>>                   open_file->oplock_break_cancelled = true;
> >>>           }
> >>>           spin_unlock(&tcon->open_file_lock);
> >>>
> >>>           mutex_lock(&tcon->crfid.fid_mutex); <=== most likely here
> >>>           tcon->crfid.is_valid = false;
> >>>           memset(tcon->crfid.fid, 0, sizeof(struct cifs_fid));
> >>>           mutex_unlock(&tcon->crfid.fid_mutex);
> >>>
> >>> I think these processes are trying to lock the same lock twice: one in
> >>> open_shroot() and since Open ends up having to reconnect, once again in
> >>> mark_open_files_invalid(). I think it's the same lock because I don't
> >>> see why the tcon pointers would be different in those 2 spots. Kernel
> >>> mutexes are not reentrant so this is a deadlock.
> >>
> >> Is there anything we can do about this? Is this maybe already fixed in newer kernels?
> >>
> >> Regards, Martijn
> >>
> >>>
> >>> Cheers,
> >>>
> >>
> >> --
> >> Martijn de Gouw
> >> Designer
> >> Prodrive Technologies
> >> Mobile: +31 63 17 76 161
> >> Phone:  +31 40 26 76 200
>
> --
> Martijn de Gouw
> Designer
> Prodrive Technologies
> Mobile: +31 63 17 76 161
> Phone:  +31 40 26 76 200



-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Many processes end up in uninterruptible sleep accessing cifs mounts
  2019-07-06 13:05             ` Steve French
@ 2019-07-06 16:40               ` Pavel Shilovsky
  2019-07-09  8:48                 ` Martijn de Gouw
  2019-07-08  9:04               ` Martijn de Gouw
  1 sibling, 1 reply; 16+ messages in thread
From: Pavel Shilovsky @ 2019-07-06 16:40 UTC (permalink / raw)
  To: Steve French; +Cc: Martijn de Gouw, Aurélien Aptel, linux-cifs

Martijn, thanks for the traces. This is a deadlock as mentioned by Aurelien:

[36247.368103] INFO: task puppet:3836 blocked for more than 120 seconds.
[36247.368119]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
[36247.368135] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[36247.368158] puppet          D    0  3836    541 0x00000000
[36247.368159] Call Trace:
[36247.368161]  ? __schedule+0x338/0x850
[36247.368163]  schedule+0x3c/0x90
[36247.368164]  schedule_preempt_disabled+0x14/0x20
[36247.368165]  __mutex_lock.isra.7+0x19f/0x540
[36247.368166]  ? __switch_to_asm+0x34/0x70
[36247.368167]  ? __switch_to_asm+0x40/0x70
[36247.368169]  ? try_module_get+0x68/0x100
[36247.368178]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
[36247.368186]  smb2_reconnect+0x1d7/0x4b0 [cifs]
[36247.368188]  ? _raw_spin_unlock_irq+0x1d/0x30
[36247.368189]  ? finish_task_switch+0x7d/0x290
[36247.368190]  ? __switch_to_asm+0x40/0x70
[36247.368198]  smb2_plain_req_init+0x30/0x240 [cifs]
[36247.368199]  ? _raw_spin_lock_irqsave+0x25/0x50
[36247.368208]  SMB2_open_init+0x6d/0x7c0 [cifs]
[36247.368209]  ? _raw_spin_lock+0x13/0x30
[36247.368218]  ? cifs_mid_q_entry_release+0x15/0x40 [cifs]
[36247.368227]  ? SMB2_open+0x150/0x520 [cifs]
[36247.368235]  SMB2_open+0x150/0x520 [cifs]
[36247.368238]  ? sched_clock+0x5/0x10
[36247.368247]  ? open_shroot+0x12f/0x200 [cifs]
[36247.368255]  open_shroot+0x12f/0x200 [cifs]
[36247.368258]  ? __follow_mount_rcu.isra.31+0x3c/0xf0
[36247.368267]  smb2_query_path_info+0x93/0x220 [cifs]
[36247.368269]  ? walk_component+0x48/0x360
[36247.368278]  cifs_get_inode_info+0x580/0xb10 [cifs]
[36247.368286]  ? filename_lookup+0xf8/0x1a0
[36247.368294]  ? build_path_from_dentry_optional_prefix+0x1e9/0x440 [cifs]
[36247.368303]  cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
[36247.368311]  cifs_getattr+0x5b/0x1b0 [cifs]
[36247.368313]  vfs_statx+0x89/0xe0
[36247.368315]  __do_sys_newlstat+0x39/0x70
[36247.368318]  do_syscall_64+0x55/0x100
[36247.368320]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[36247.368321] RIP: 0033:0x7f562d3a0335
[36247.368323] Code: Bad RIP value.

This looks urgent to fix - we have a parallel thread with the similar
problem observed on 5.1 kernel. In order to fix it we need to drop
open shroot mutex before going into SMB2_open_init and other PDU
functions.

You can try to work around this by specifying "nohandlecache" mount option.

--
Best regards,
Pavel Shilovsky

сб, 6 июл. 2019 г. в 06:05, Steve French <smfrench@gmail.com>:
>
> Are we sure this isn't something really simple like the server IP
> address changing?  Remember in 5.0 kernel (also updated cifs-utils)
> the code to allow failover when server's IP address changes was added.
>
> On Sat, Jul 6, 2019 at 6:12 AM Martijn de Gouw
> <martijn.de.gouw@prodrive-technologies.com> wrote:
> >
> > Hi Pavel,
> >
> > On 04-07-2019 23:34, Pavel Shilovsky wrote:
> > > Hi Martijn,
> > >
> > > Thanks for reporting the problem. Have you tried v5.1.5 kernel and
> > > above? That one has many reconnect fixes. If yes, could you please
> > > provide the kernel stack / panic traces when running new versions of
> > > the kernel?
> >
> > I can try to run them for a while on a less critical server. But even if that machine runs fine, there is not really a guarantee the cifs issue we have right now is solved for our critical machines, causing me to be very hesitated to upgrade those.
> >
> > Btw, here is another kernel dump, generated today. Please note that on this machine the cifs mounts are not really used, all 'action'
> > performed on these mounts is the check if they are still mounted (by puppet).
> >
> >
> > [36247.367572] INFO: task node_exporter:3567 blocked for more than 120 seconds.
> > [36247.367613]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
> > [36247.367630] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [36247.367648] node_exporter   D    0  3567      1 0x00000000
> > [36247.367650] Call Trace:
> > [36247.367668]  ? __schedule+0x338/0x850
> > [36247.367671]  ? preempt_count_add+0x67/0xb0
> > [36247.367673]  schedule+0x3c/0x90
> > [36247.367675]  schedule_preempt_disabled+0x14/0x20
> > [36247.367676]  __mutex_lock.isra.7+0x19f/0x540
> > [36247.367680]  ? try_module_get+0x68/0x100
> > [36247.367792]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
> > [36247.367803]  smb2_reconnect+0x1d7/0x4b0 [cifs]
> > [36247.367824]  smb2_plain_req_init+0x30/0x240 [cifs]
> > [36247.367826]  ? __switch_to_asm+0x34/0x70
> > [36247.367827]  ? __switch_to_asm+0x40/0x70
> > [36247.367836]  SMB2_open_init+0x6d/0x7c0 [cifs]
> > [36247.367846]  ? smb2_queryfs+0x13d/0x350 [cifs]
> > [36247.367848]  ? lookup_fast+0xc8/0x2e0
> > [36247.367857]  smb2_queryfs+0x13d/0x350 [cifs]
> > [36247.367861]  ? lookup_fast+0xc8/0x2e0
> > [36247.367863]  ? ___cache_free+0x31/0x2e0
> > [36247.367865]  ? terminate_walk+0x93/0x100
> > [36247.367873]  ? cifs_statfs+0xad/0x300 [cifs]
> > [36247.367880]  cifs_statfs+0xad/0x300 [cifs]
> > [36247.367894]  statfs_by_dentry+0x6a/0x90
> > [36247.367895]  vfs_statfs+0x16/0xc0
> > [36247.367897]  user_statfs+0x50/0xa0
> > [36247.367898]  __do_sys_statfs+0x20/0x50
> > [36247.367902]  do_syscall_64+0x55/0x100
> > [36247.367903]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [36247.367905] RIP: 0033:0x4a5c20
> > [36247.367936] Code: Bad RIP value.
> > [36247.367937] RSP: 002b:000000c0003ef490 EFLAGS: 00000206 ORIG_RAX: 0000000000000089
> > [36247.367938] RAX: ffffffffffffffda RBX: 000000c00002a000 RCX: 00000000004a5c20
> > [36247.367939] RDX: 0000000000000000 RSI: 000000c0003ef5c0 RDI: 000000c0004510e0
> > [36247.367940] RBP: 000000c0003ef4f0 R08: 0000000000000000 R09: 0000000000000000
> > [36247.367940] R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff
> > [36247.367941] R13: 0000000000000088 R14: 0000000000000087 R15: 0000000000000100
> > [36247.367954] INFO: task kworker/1:2:32265 blocked for more than 120 seconds.
> > [36247.367973]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
> > [36247.367989] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [36247.368008] kworker/1:2     D    0 32265      2 0x80000000
> > [36247.368020] Workqueue: cifsiod smb2_reconnect_server [cifs]
> > [36247.368021] Call Trace:
> > [36247.368023]  ? __schedule+0x338/0x850
> > [36247.368025]  schedule+0x3c/0x90
> > [36247.368026]  schedule_preempt_disabled+0x14/0x20
> > [36247.368028]  __mutex_lock.isra.7+0x19f/0x540
> > [36247.368037]  ? SMB2_sess_auth_rawntlmssp_authenticate+0x69/0x170 [cifs]
> > [36247.368046]  ? SMB2_sess_setup+0x17f/0x2c0 [cifs]
> > [36247.368054]  ? cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
> > [36247.368055]  ? _raw_spin_unlock+0x16/0x30
> > [36247.368062]  cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
> > [36247.368071]  smb2_reconnect+0x2d6/0x4b0 [cifs]
> > [36247.368073]  ? _raw_spin_unlock+0x16/0x30
> > [36247.368074]  ? preempt_count_add+0x67/0xb0
> > [36247.368083]  smb2_reconnect_server+0x1bb/0x350 [cifs]
> > [36247.368090]  process_one_work+0x188/0x3a0
> > [36247.368092]  worker_thread+0x4c/0x3a0
> > [36247.368094]  ? preempt_count_add+0x67/0xb0
> > [36247.368095]  ? process_one_work+0x3a0/0x3a0
> > [36247.368097]  kthread+0xf8/0x130
> > [36247.368099]  ? kthread_create_worker_on_cpu+0x70/0x70
> > [36247.368101]  ret_from_fork+0x35/0x40
> > [36247.368103] INFO: task puppet:3836 blocked for more than 120 seconds.
> > [36247.368119]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
> > [36247.368135] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [36247.368158] puppet          D    0  3836    541 0x00000000
> > [36247.368159] Call Trace:
> > [36247.368161]  ? __schedule+0x338/0x850
> > [36247.368163]  schedule+0x3c/0x90
> > [36247.368164]  schedule_preempt_disabled+0x14/0x20
> > [36247.368165]  __mutex_lock.isra.7+0x19f/0x540
> > [36247.368166]  ? __switch_to_asm+0x34/0x70
> > [36247.368167]  ? __switch_to_asm+0x40/0x70
> > [36247.368169]  ? try_module_get+0x68/0x100
> > [36247.368178]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
> > [36247.368186]  smb2_reconnect+0x1d7/0x4b0 [cifs]
> > [36247.368188]  ? _raw_spin_unlock_irq+0x1d/0x30
> > [36247.368189]  ? finish_task_switch+0x7d/0x290
> > [36247.368190]  ? __switch_to_asm+0x40/0x70
> > [36247.368198]  smb2_plain_req_init+0x30/0x240 [cifs]
> > [36247.368199]  ? _raw_spin_lock_irqsave+0x25/0x50
> > [36247.368208]  SMB2_open_init+0x6d/0x7c0 [cifs]
> > [36247.368209]  ? _raw_spin_lock+0x13/0x30
> > [36247.368218]  ? cifs_mid_q_entry_release+0x15/0x40 [cifs]
> > [36247.368227]  ? SMB2_open+0x150/0x520 [cifs]
> > [36247.368235]  SMB2_open+0x150/0x520 [cifs]
> > [36247.368238]  ? sched_clock+0x5/0x10
> > [36247.368247]  ? open_shroot+0x12f/0x200 [cifs]
> > [36247.368255]  open_shroot+0x12f/0x200 [cifs]
> > [36247.368258]  ? __follow_mount_rcu.isra.31+0x3c/0xf0
> > [36247.368267]  smb2_query_path_info+0x93/0x220 [cifs]
> > [36247.368269]  ? walk_component+0x48/0x360
> > [36247.368278]  cifs_get_inode_info+0x580/0xb10 [cifs]
> > [36247.368286]  ? filename_lookup+0xf8/0x1a0
> > [36247.368294]  ? build_path_from_dentry_optional_prefix+0x1e9/0x440 [cifs]
> > [36247.368303]  cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
> > [36247.368311]  cifs_getattr+0x5b/0x1b0 [cifs]
> > [36247.368313]  vfs_statx+0x89/0xe0
> > [36247.368315]  __do_sys_newlstat+0x39/0x70
> > [36247.368318]  do_syscall_64+0x55/0x100
> > [36247.368320]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [36247.368321] RIP: 0033:0x7f562d3a0335
> > [36247.368323] Code: Bad RIP value.
> > [36247.368324] RSP: 002b:00007ffc6bbc4398 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
> > [36247.368325] RAX: ffffffffffffffda RBX: 00007ffc6bbc43b0 RCX: 00007f562d3a0335
> > [36247.368326] RDX: 00007ffc6bbc43b0 RSI: 00007ffc6bbc43b0 RDI: 000055e4e5e1bd50
> > [36247.368326] RBP: 000055e4e2eb65e0 R08: 0000000000000002 R09: 000055e4e2eb69e0
> > [36247.368327] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
> > [36247.368328] R13: 000055e4e2ef7258 R14: 000055e4e2ef8810 R15: 000055e4e2f92ad0
> > [36368.186922] INFO: task node_exporter:3567 blocked for more than 120 seconds.
> > [36368.186988]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
> > [36368.187039] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [36368.187097] node_exporter   D    0  3567      1 0x00000000
> > [36368.187102] Call Trace:
> > [36368.187117]  ? __schedule+0x338/0x850
> > [36368.187123]  ? preempt_count_add+0x67/0xb0
> > [36368.187128]  schedule+0x3c/0x90
> > [36368.187133]  schedule_preempt_disabled+0x14/0x20
> > [36368.187138]  __mutex_lock.isra.7+0x19f/0x540
> > [36368.187175]  ? try_module_get+0x68/0x100
> > [36368.187232]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
> > [36368.187265]  smb2_reconnect+0x1d7/0x4b0 [cifs]
> > [36368.187300]  smb2_plain_req_init+0x30/0x240 [cifs]
> > [36368.187304]  ? __switch_to_asm+0x34/0x70
> > [36368.187307]  ? __switch_to_asm+0x40/0x70
> > [36368.187337]  SMB2_open_init+0x6d/0x7c0 [cifs]
> > [36368.187372]  ? smb2_queryfs+0x13d/0x350 [cifs]
> > [36368.187375]  ? lookup_fast+0xc8/0x2e0
> > [36368.187405]  smb2_queryfs+0x13d/0x350 [cifs]
> > [36368.187410]  ? lookup_fast+0xc8/0x2e0
> > [36368.187416]  ? ___cache_free+0x31/0x2e0
> > [36368.187421]  ? terminate_walk+0x93/0x100
> > [36368.187449]  ? cifs_statfs+0xad/0x300 [cifs]
> > [36368.187471]  cifs_statfs+0xad/0x300 [cifs]
> > [36368.187477]  statfs_by_dentry+0x6a/0x90
> > [36368.187482]  vfs_statfs+0x16/0xc0
> > [36368.187486]  user_statfs+0x50/0xa0
> > [36368.187491]  __do_sys_statfs+0x20/0x50
> > [36368.187499]  do_syscall_64+0x55/0x100
> > [36368.187503]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [36368.187507] RIP: 0033:0x4a5c20
> > [36368.187517] Code: Bad RIP value.
> > [36368.187519] RSP: 002b:000000c0003ef490 EFLAGS: 00000206 ORIG_RAX: 0000000000000089
> > [36368.187523] RAX: ffffffffffffffda RBX: 000000c00002a000 RCX: 00000000004a5c20
> > [36368.187526] RDX: 0000000000000000 RSI: 000000c0003ef5c0 RDI: 000000c0004510e0
> > [36368.187528] RBP: 000000c0003ef4f0 R08: 0000000000000000 R09: 0000000000000000
> > [36368.187530] R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff
> > [36368.187532] R13: 0000000000000088 R14: 0000000000000087 R15: 0000000000000100
> > [36368.187572] INFO: task kworker/1:2:32265 blocked for more than 120 seconds.
> > [36368.187627]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
> > [36368.187677] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [36368.187735] kworker/1:2     D    0 32265      2 0x80000000
> > [36368.187771] Workqueue: cifsiod smb2_reconnect_server [cifs]
> > [36368.187773] Call Trace:
> > [36368.187780]  ? __schedule+0x338/0x850
> > [36368.187785]  schedule+0x3c/0x90
> > [36368.187790]  schedule_preempt_disabled+0x14/0x20
> > [36368.187795]  __mutex_lock.isra.7+0x19f/0x540
> > [36368.187826]  ? SMB2_sess_auth_rawntlmssp_authenticate+0x69/0x170 [cifs]
> > [36368.187855]  ? SMB2_sess_setup+0x17f/0x2c0 [cifs]
> > [36368.187880]  ? cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
> > [36368.187885]  ? _raw_spin_unlock+0x16/0x30
> > [36368.187908]  cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
> > [36368.187936]  smb2_reconnect+0x2d6/0x4b0 [cifs]
> > [36368.187942]  ? _raw_spin_unlock+0x16/0x30
> > [36368.187947]  ? preempt_count_add+0x67/0xb0
> > [36368.187976]  smb2_reconnect_server+0x1bb/0x350 [cifs]
> > [36368.187984]  process_one_work+0x188/0x3a0
> > [36368.187990]  worker_thread+0x4c/0x3a0
> > [36368.187994]  ? preempt_count_add+0x67/0xb0
> > [36368.187998]  ? process_one_work+0x3a0/0x3a0
> > [36368.188001]  kthread+0xf8/0x130
> > [36368.188005]  ? kthread_create_worker_on_cpu+0x70/0x70
> > [36368.188009]  ret_from_fork+0x35/0x40
> > [36368.188014] INFO: task puppet:3836 blocked for more than 120 seconds.
> > [36368.188064]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
> > [36368.188114] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [36368.188182] puppet          D    0  3836    541 0x00000000
> > [36368.188186] Call Trace:
> > [36368.188191]  ? __schedule+0x338/0x850
> > [36368.188196]  schedule+0x3c/0x90
> > [36368.188200]  schedule_preempt_disabled+0x14/0x20
> > [36368.188205]  __mutex_lock.isra.7+0x19f/0x540
> > [36368.188208]  ? __switch_to_asm+0x34/0x70
> > [36368.188211]  ? __switch_to_asm+0x40/0x70
> > [36368.188215]  ? try_module_get+0x68/0x100
> > [36368.188244]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
> > [36368.188271]  smb2_reconnect+0x1d7/0x4b0 [cifs]
> > [36368.188277]  ? _raw_spin_unlock_irq+0x1d/0x30
> > [36368.188280]  ? finish_task_switch+0x7d/0x290
> > [36368.188283]  ? __switch_to_asm+0x40/0x70
> > [36368.188310]  smb2_plain_req_init+0x30/0x240 [cifs]
> > [36368.188314]  ? _raw_spin_lock_irqsave+0x25/0x50
> > [36368.188342]  SMB2_open_init+0x6d/0x7c0 [cifs]
> > [36368.188346]  ? _raw_spin_lock+0x13/0x30
> > [36368.188376]  ? cifs_mid_q_entry_release+0x15/0x40 [cifs]
> > [36368.188405]  ? SMB2_open+0x150/0x520 [cifs]
> > [36368.188432]  SMB2_open+0x150/0x520 [cifs]
> > [36368.188440]  ? sched_clock+0x5/0x10
> > [36368.188470]  ? open_shroot+0x12f/0x200 [cifs]
> > [36368.188497]  open_shroot+0x12f/0x200 [cifs]
> > [36368.188503]  ? __follow_mount_rcu.isra.31+0x3c/0xf0
> > [36368.188534]  smb2_query_path_info+0x93/0x220 [cifs]
> > [36368.188539]  ? walk_component+0x48/0x360
> > [36368.188569]  cifs_get_inode_info+0x580/0xb10 [cifs]
> > [36368.188574]  ? filename_lookup+0xf8/0x1a0
> > [36368.188601]  ? build_path_from_dentry_optional_prefix+0x1e9/0x440 [cifs]
> > [36368.188630]  cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
> > [36368.188659]  cifs_getattr+0x5b/0x1b0 [cifs]
> > [36368.188664]  vfs_statx+0x89/0xe0
> > [36368.188670]  __do_sys_newlstat+0x39/0x70
> > [36368.188677]  do_syscall_64+0x55/0x100
> > [36368.188681]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [36368.188684] RIP: 0033:0x7f562d3a0335
> > [36368.188689] Code: Bad RIP value.
> > [36368.188691] RSP: 002b:00007ffc6bbc4398 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
> > [36368.188695] RAX: ffffffffffffffda RBX: 00007ffc6bbc43b0 RCX: 00007f562d3a0335
> > [36368.188696] RDX: 00007ffc6bbc43b0 RSI: 00007ffc6bbc43b0 RDI: 000055e4e5e1bd50
> > [36368.188698] RBP: 000055e4e2eb65e0 R08: 0000000000000002 R09: 000055e4e2eb69e0
> > [36368.188700] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
> > [36368.188702] R13: 000055e4e2ef7258 R14: 000055e4e2ef8810 R15: 000055e4e2f92ad0
> > [36489.006262] INFO: task node_exporter:3567 blocked for more than 120 seconds.
> > [36489.006300]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
> > [36489.006329] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [36489.006458] node_exporter   D    0  3567      1 0x00000000
> > [36489.006461] Call Trace:
> > [36489.006470]  ? __schedule+0x338/0x850
> > [36489.006474]  ? preempt_count_add+0x67/0xb0
> > [36489.006476]  schedule+0x3c/0x90
> > [36489.006478]  schedule_preempt_disabled+0x14/0x20
> > [36489.006480]  __mutex_lock.isra.7+0x19f/0x540
> > [36489.006485]  ? try_module_get+0x68/0x100
> > [36489.006515]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
> > [36489.006531]  smb2_reconnect+0x1d7/0x4b0 [cifs]
> > [36489.006547]  smb2_plain_req_init+0x30/0x240 [cifs]
> > [36489.006549]  ? __switch_to_asm+0x34/0x70
> > [36489.006550]  ? __switch_to_asm+0x40/0x70
> > [36489.006565]  SMB2_open_init+0x6d/0x7c0 [cifs]
> > [36489.006581]  ? smb2_queryfs+0x13d/0x350 [cifs]
> > [36489.006583]  ? lookup_fast+0xc8/0x2e0
> > [36489.006597]  smb2_queryfs+0x13d/0x350 [cifs]
> > [36489.006599]  ? lookup_fast+0xc8/0x2e0
> > [36489.006602]  ? ___cache_free+0x31/0x2e0
> > [36489.006604]  ? terminate_walk+0x93/0x100
> > [36489.006619]  ? cifs_statfs+0xad/0x300 [cifs]
> > [36489.006632]  cifs_statfs+0xad/0x300 [cifs]
> > [36489.006635]  statfs_by_dentry+0x6a/0x90
> > [36489.006637]  vfs_statfs+0x16/0xc0
> > [36489.006639]  user_statfs+0x50/0xa0
> > [36489.006641]  __do_sys_statfs+0x20/0x50
> > [36489.006645]  do_syscall_64+0x55/0x100
> > [36489.006647]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [36489.006649] RIP: 0033:0x4a5c20
> > [36489.006654] Code: Bad RIP value.
> > [36489.006655] RSP: 002b:000000c0003ef490 EFLAGS: 00000206 ORIG_RAX: 0000000000000089
> > [36489.006657] RAX: ffffffffffffffda RBX: 000000c00002a000 RCX: 00000000004a5c20
> > [36489.006658] RDX: 0000000000000000 RSI: 000000c0003ef5c0 RDI: 000000c0004510e0
> > [36489.006659] RBP: 000000c0003ef4f0 R08: 0000000000000000 R09: 0000000000000000
> > [36489.006660] R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff
> > [36489.006661] R13: 0000000000000088 R14: 0000000000000087 R15: 0000000000000100
> > [36489.006675] INFO: task kworker/1:2:32265 blocked for more than 120 seconds.
> > [36489.006812]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
> > [36489.006940] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [36489.007076] kworker/1:2     D    0 32265      2 0x80000000
> > [36489.007100] Workqueue: cifsiod smb2_reconnect_server [cifs]
> > [36489.007101] Call Trace:
> > [36489.007105]  ? __schedule+0x338/0x850
> > [36489.007107]  schedule+0x3c/0x90
> > [36489.007109]  schedule_preempt_disabled+0x14/0x20
> > [36489.007111]  __mutex_lock.isra.7+0x19f/0x540
> > [36489.007124]  ? SMB2_sess_auth_rawntlmssp_authenticate+0x69/0x170 [cifs]
> > [36489.007134]  ? SMB2_sess_setup+0x17f/0x2c0 [cifs]
> > [36489.007141]  ? cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
> > [36489.007143]  ? _raw_spin_unlock+0x16/0x30
> > [36489.007150]  cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
> > [36489.007159]  smb2_reconnect+0x2d6/0x4b0 [cifs]
> > [36489.007161]  ? _raw_spin_unlock+0x16/0x30
> > [36489.007163]  ? preempt_count_add+0x67/0xb0
> > [36489.007172]  smb2_reconnect_server+0x1bb/0x350 [cifs]
> > [36489.007176]  process_one_work+0x188/0x3a0
> > [36489.007178]  worker_thread+0x4c/0x3a0
> > [36489.007179]  ? preempt_count_add+0x67/0xb0
> > [36489.007180]  ? process_one_work+0x3a0/0x3a0
> > [36489.007181]  kthread+0xf8/0x130
> > [36489.007183]  ? kthread_create_worker_on_cpu+0x70/0x70
> > [36489.007184]  ret_from_fork+0x35/0x40
> > [36489.007186] INFO: task puppet:3836 blocked for more than 120 seconds.
> > [36489.007294]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
> > [36489.007409] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [36489.007540] puppet          D    0  3836    541 0x00000000
> > [36489.007541] Call Trace:
> > [36489.007545]  ? __schedule+0x338/0x850
> > [36489.007548]  schedule+0x3c/0x90
> > [36489.007550]  schedule_preempt_disabled+0x14/0x20
> > [36489.007552]  __mutex_lock.isra.7+0x19f/0x540
> > [36489.007553]  ? __switch_to_asm+0x34/0x70
> > [36489.007555]  ? __switch_to_asm+0x40/0x70
> > [36489.007558]  ? try_module_get+0x68/0x100
> > [36489.007580]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
> > [36489.007594]  smb2_reconnect+0x1d7/0x4b0 [cifs]
> > [36489.007596]  ? _raw_spin_unlock_irq+0x1d/0x30
> > [36489.007598]  ? finish_task_switch+0x7d/0x290
> > [36489.007600]  ? __switch_to_asm+0x40/0x70
> > [36489.007613]  smb2_plain_req_init+0x30/0x240 [cifs]
> > [36489.007615]  ? _raw_spin_lock_irqsave+0x25/0x50
> > [36489.007628]  SMB2_open_init+0x6d/0x7c0 [cifs]
> > [36489.007630]  ? _raw_spin_lock+0x13/0x30
> > [36489.007644]  ? cifs_mid_q_entry_release+0x15/0x40 [cifs]
> > [36489.007658]  ? SMB2_open+0x150/0x520 [cifs]
> > [36489.007672]  SMB2_open+0x150/0x520 [cifs]
> > [36489.007677]  ? sched_clock+0x5/0x10
> > [36489.007692]  ? open_shroot+0x12f/0x200 [cifs]
> > [36489.007707]  open_shroot+0x12f/0x200 [cifs]
> > [36489.007711]  ? __follow_mount_rcu.isra.31+0x3c/0xf0
> > [36489.007727]  smb2_query_path_info+0x93/0x220 [cifs]
> > [36489.007729]  ? walk_component+0x48/0x360
> > [36489.007745]  cifs_get_inode_info+0x580/0xb10 [cifs]
> > [36489.007748]  ? filename_lookup+0xf8/0x1a0
> > [36489.007762]  ? build_path_from_dentry_optional_prefix+0x1e9/0x440 [cifs]
> > [36489.007778]  cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
> > [36489.007791]  cifs_getattr+0x5b/0x1b0 [cifs]
> > [36489.007794]  vfs_statx+0x89/0xe0
> > [36489.007797]  __do_sys_newlstat+0x39/0x70
> > [36489.007801]  do_syscall_64+0x55/0x100
> > [36489.007802]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [36489.007805] RIP: 0033:0x7f562d3a0335
> > [36489.007808] Code: Bad RIP value.
> > [36489.007809] RSP: 002b:00007ffc6bbc4398 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
> > [36489.007811] RAX: ffffffffffffffda RBX: 00007ffc6bbc43b0 RCX: 00007f562d3a0335
> > [36489.007812] RDX: 00007ffc6bbc43b0 RSI: 00007ffc6bbc43b0 RDI: 000055e4e5e1bd50
> > [36489.007813] RBP: 000055e4e2eb65e0 R08: 0000000000000002 R09: 000055e4e2eb69e0
> > [36489.007814] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
> > [36489.007815] R13: 000055e4e2ef7258 R14: 000055e4e2ef8810 R15: 000055e4e2f92ad0
> > [36609.825705] INFO: task node_exporter:3567 blocked for more than 120 seconds.
> > [36609.825836]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
> > [36609.825869] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [36609.825908] node_exporter   D    0  3567      1 0x00000000
> > [36609.825910] Call Trace:
> > [36609.825919]  ? __schedule+0x338/0x850
> > [36609.825922]  ? preempt_count_add+0x67/0xb0
> > [36609.825925]  schedule+0x3c/0x90
> > [36609.825927]  schedule_preempt_disabled+0x14/0x20
> > [36609.825929]  __mutex_lock.isra.7+0x19f/0x540
> > [36609.825933]  ? try_module_get+0x68/0x100
> > [36609.825960]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
> > [36609.825979]  smb2_reconnect+0x1d7/0x4b0 [cifs]
> > [36609.825999]  smb2_plain_req_init+0x30/0x240 [cifs]
> > [36609.826001]  ? __switch_to_asm+0x34/0x70
> > [36609.826002]  ? __switch_to_asm+0x40/0x70
> > [36609.826020]  SMB2_open_init+0x6d/0x7c0 [cifs]
> > [36609.826039]  ? smb2_queryfs+0x13d/0x350 [cifs]
> > [36609.826040]  ? lookup_fast+0xc8/0x2e0
> > [36609.826057]  smb2_queryfs+0x13d/0x350 [cifs]
> > [36609.826060]  ? lookup_fast+0xc8/0x2e0
> > [36609.826064]  ? ___cache_free+0x31/0x2e0
> > [36609.826066]  ? terminate_walk+0x93/0x100
> > [36609.826084]  ? cifs_statfs+0xad/0x300 [cifs]
> > [36609.826099]  cifs_statfs+0xad/0x300 [cifs]
> > [36609.826103]  statfs_by_dentry+0x6a/0x90
> > [36609.826104]  vfs_statfs+0x16/0xc0
> > [36609.826106]  user_statfs+0x50/0xa0
> > [36609.826108]  __do_sys_statfs+0x20/0x50
> > [36609.826113]  do_syscall_64+0x55/0x100
> > [36609.826115]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [36609.826117] RIP: 0033:0x4a5c20
> > [36609.826123] Code: Bad RIP value.
> > [36609.826124] RSP: 002b:000000c0003ef490 EFLAGS: 00000206 ORIG_RAX: 0000000000000089
> > [36609.826126] RAX: ffffffffffffffda RBX: 000000c00002a000 RCX: 00000000004a5c20
> > [36609.826127] RDX: 0000000000000000 RSI: 000000c0003ef5c0 RDI: 000000c0004510e0
> > [36609.826128] RBP: 000000c0003ef4f0 R08: 0000000000000000 R09: 0000000000000000
> > [36609.826129] R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff
> > [36609.826130] R13: 0000000000000088 R14: 0000000000000087 R15: 0000000000000100
> > [73117.654990] CIFS VFS: Send error in SessSetup = -126
> >
> > Regards, Martijn
> >
> > >
> > > --
> > > Best regards,
> > > Pavel Shilovsky
> > >
> > > чт, 4 июл. 2019 г. в 06:36, Martijn de Gouw
> > > <martijn.de.gouw@prodrive-technologies.com>:
> > >>
> > >> Hi,
> > >>
> > >> On 04-07-2019 13:22, Aurélien Aptel wrote:
> > >>> Martijn de Gouw <martijn.de.gouw@prodrive-technologies.com> writes:
> > >>>>> Are there any kernel oops/panic with stack traces and register dumps in
> > >>>>> the log?
> > >>>>>
> > >>>>> You can inspect the kernel stack trace of the hung processes (to see where
> > >>>>> they are stuck) by printing the file /proc/<pid>/stack.
> > >>>>
> > >>>> These are the stacks of all processes that are D, most of them being df.
> > >>>> I also attached the cifs Stats output below.
> > >>>
> > >>> Ok thanks. What about Oops or BUG or panic in dmesg logs, did you see
> > >>> any?
> > >>
> > >> I did find the following messages in the dmesg of one of the servers:
> > >>
> > >> [    4.797893] FS-Cache: Duplicate cookie detected
> > >> [    4.797915] FS-Cache: O-cookie c=000000001a791554 [p=00000000fb6f31ee fl=222 nc=0 na=1]
> > >> [    4.797934] FS-Cache: O-cookie d=0000000020a06fab n=00000000654600e7
> > >> [    4.797949] FS-Cache: O-key=[8] '020001bd0a010102'
> > >> [    4.797963] FS-Cache: N-cookie c=000000005d0bf4a5 [p=00000000fb6f31ee fl=2 nc=0 na=1]
> > >> [    4.797982] FS-Cache: N-cookie d=0000000020a06fab n=000000004e1e47aa
> > >> [    4.797997] FS-Cache: N-key=[8] '020001bd0a010102'
> > >> [    4.798643] FS-Cache: Duplicate cookie detected
> > >> [    4.798659] FS-Cache: O-cookie c=000000001a791554 [p=00000000fb6f31ee fl=222 nc=0 na=1]
> > >> [    4.798679] FS-Cache: O-cookie d=0000000020a06fab n=00000000654600e7
> > >> [    4.798695] FS-Cache: O-key=[8] '020001bd0a010102'
> > >> [    4.798714] FS-Cache: N-cookie c=00000000cbe44971 [p=00000000fb6f31ee fl=2 nc=0 na=1]
> > >> [    4.798733] FS-Cache: N-cookie d=0000000020a06fab n=00000000ab0e78a6
> > >> [    4.798747] FS-Cache: N-key=[8] '020001bd0a010102'
> > >> [    4.906667] FS-Cache: Netfs 'nfs' registered for caching
> > >> [12738.729173] CIFS VFS: Send error in SessSetup = -126
> > >> [99125.480751] CIFS VFS: Send error in SessSetup = -126
> > >> [185517.295175] CIFS VFS: Send error in SessSetup = -126
> > >> [250515.749714] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
> > >> [250515.749740] BUG: unable to handle kernel paging request at ffff8ec52a6fe1d0
> > >> [250515.749757] PGD 1b2602067 P4D 1b2602067 PUD 42dbff063 PMD 42a357063 PTE 800000042a6fe063
> > >> [250515.749779] Oops: 0011 [#1] PREEMPT SMP PTI
> > >> [250515.749792] CPU: 1 PID: 796 Comm: cifsd Tainted: G            E     4.20.17-pd-4.20.y #20190611
> > >> [250515.749812] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/19/2018
> > >> [250515.749844] RIP: 0010:0xffff8ec52a6fe1d0
> > >> [250515.749860] Code: ff ff 00 00 00 00 fd 01 00 00 00 7d 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> e1 6f 2a c5 8e ff ff d0 e1 6f 2a c5 8e ff ff 80 70 82 2b c5 8e
> > >> [250515.749914] RSP: 0018:ffff9e60c2e2bdc0 EFLAGS: 00010202
> > >> [250515.749927] RAX: ffff8ec52a6fe1d0 RBX: ffff8ec526fce800 RCX: dead000000000200
> > >> [250515.749944] RDX: ffff8ec52d3e6800 RSI: 0000000000000246 RDI: ffff8ec52a6fe198
> > >> [250515.749962] RBP: ffff8ec52d3e6800 R08: 0000000000000002 R09: ffff8ec52a6fe170
> > >> [250515.749979] R10: ffff9e60c2e2bc10 R11: ffff8ec527088000 R12: ffff8ec526fce9c0
> > >> [250515.749997] R13: ffff8ec52d3e6800 R14: ffff9e60c2e2bdd8 R15: ffff8ec52d3e6b80
> > >> [250515.750014] FS:  0000000000000000(0000) GS:ffff8ec52fa80000(0000) knlGS:0000000000000000
> > >> [250515.750033] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >> [250515.750048] CR2: ffff8ec52a6fe1d0 CR3: 000000042a53c005 CR4: 00000000003606e0
> > >> [250515.750100] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > >> [250515.750102] CIFS VFS: No task to wake, unknown frame received! NumMids 3
> > >> [250515.750119] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > >> [250515.750120] Call Trace:
> > >> [250515.750149] 00000000: 424d53fe 00010040 00000000 00000005  .SMB@...........
> > >> [250515.750193]  ? cifs_reconnect+0x337/0x880 [cifs]
> > >> [250515.750201]  ? cifs_handle_standard+0x169/0x190 [cifs]
> > >> [250515.750216] 00000010: 10000009 00000098 00000db2 00000000  ................
> > >> [250515.750234]  ? cifs_demultiplex_thread+0x9e5/0xbc0 [cifs]
> > >> [250515.750240] 00000020: 0000025c 00000001 640002e9 0000c81d  \..........d....
> > >> [250515.750259]  ? finish_task_switch+0x7d/0x290
> > >> [250515.750271] 00000030: 5a9e63f3 6204a9b0 587058e3 0d45419b  .c.Z...b.XpX.AE.
> > >> [250515.750295]  ? cifs_handle_standard+0x190/0x190 [cifs]
> > >> [250515.750728] CIFS VFS: No task to wake, unknown frame received! NumMids 3
> > >> [250515.751200]  ? kthread+0xf8/0x130
> > >> [250515.751639] 00000000: 424d53fe 00010040 00000000 00000010  .SMB@...........
> > >> [250515.752102]  ? kthread_create_worker_on_cpu+0x70/0x70
> > >> [250515.752554] 00000010: 0000000d 00000068 00000db3 00000000  ....h...........
> > >> [250515.752997]  ? ret_from_fork+0x35/0x40
> > >> [250515.753429] 00000020: 0000025c 00000001 640002e9 0000c81d  \..........d....
> > >> [250515.753874] Modules linked in: cpufreq_userspace(E) cpufreq_powersave(E) cpufreq_conservative(E) arc4(E) ecb(E) md4(E) nfsv3(E) nfs_acl(E) nfs(E) sha512_ssse3(E) sha512_generic(E) lockd(E) cmac(E) grace(E) hmac(E) nls_utf8(E) cifs(E) ccm(E) dns_resolver(E) fscache(E) vmw_vsock_vmci_transport(E) vsock(E) sb_edac(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) vmw_balloon(E) aes_x86_64(E) joydev(E) evdev(E) crypto_simd(E) vmwgfx(E) serio_raw(E) cryptd(E) glue_helper(E) ttm(E) sg(E) vmw_vmci(E) drm_kms_helper(E) drm(E) button(E) ac(E) auth_rpcgss(E) sunrpc(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) ata_generic(E) sd_mod(E) ata_piix(E) crc32c_intel(E) psmouse(E) libata(E) vmw_pvscsi(E) vmxnet3(E) scsi_mod(E) i2c_piix4(E)
> > >> [250515.754336] 00000030: 49d4fd21 858665a2 fde5288f 01d2d919  !..I.e...(......
> > >> [250515.754766] CR2: ffff8ec52a6fe1d0
> > >> [250515.758927] CIFS VFS: No task to wake, unknown frame received! NumMids 3
> > >> [250515.759389] ---[ end trace 92ea62cd080150de ]---
> > >> [250515.759879] 00000000: 424d53fe 00010040 00000000 00030006  .SMB@...........
> > >> [250515.760357] RIP: 0010:0xffff8ec52a6fe1d0
> > >> [250515.761841] Code: ff ff 00 00 00 00 fd 01 00 00 00 7d 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> e1 6f 2a c5 8e ff ff d0 e1 6f 2a c5 8e ff ff 80 70 82 2b c5 8e
> > >> [250515.763287] RSP: 0018:ffff9e60c2e2bdc0 EFLAGS: 00010202
> > >> [250515.763778] RAX: ffff8ec52a6fe1d0 RBX: ffff8ec526fce800 RCX: dead000000000200
> > >> [250515.764060] 00000010: 0000000d 00000000 00000db4 00000000  ................
> > >> [250515.764310] RDX: ffff8ec52d3e6800 RSI: 0000000000000246 RDI: ffff8ec52a6fe198
> > >> [250515.764311] RBP: ffff8ec52d3e6800 R08: 0000000000000002 R09: ffff8ec52a6fe170
> > >> [250515.764312] R10: ffff9e60c2e2bc10 R11: ffff8ec527088000 R12: ffff8ec526fce9c0
> > >> [250515.764314] R13: ffff8ec52d3e6800 R14: ffff9e60c2e2bdd8 R15: ffff8ec52d3e6b80
> > >> [250515.765491] 00000020: 0000025c 00000001 640002e9 0000c81d  \..........d....
> > >> [250515.765836] FS:  0000000000000000(0000) GS:ffff8ec52fa80000(0000) knlGS:0000000000000000
> > >> [250515.767899] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >> [250515.768367] CR2: ffff8ec52a6fe1d0 CR3: 000000042a53c005 CR4: 00000000003606e0
> > >> [250515.768919] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > >> [250515.769419] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > >> [250515.769591] 00000030: 879543e1 572c0574 e53861ba 97ccc40b  .C..t.,W.a8.....
> > >> [271909.743603] CIFS VFS: Send error in SessSetup = -126
> > >> [358301.908585] CIFS VFS: Send error in SessSetup = -126
> > >> [444693.566453] CIFS VFS: Send error in SessSetup = -126
> > >> [531086.090040] CIFS VFS: Send error in SessSetup = -126
> > >> [617477.785390] CIFS VFS: Send error in SessSetup = -126
> > >> [703869.556705] CIFS VFS: Send error in SessSetup = -126
> > >> [790261.656853] CIFS VFS: Send error in SessSetup = -126
> > >> [876653.496928] CIFS VFS: Send error in SessSetup = -126
> > >> [963045.816742] CIFS VFS: Send error in SessSetup = -126
> > >> [1049437.566219] CIFS VFS: Send error in SessSetup = -126
> > >>
> > >>
> > >> And from another server:
> > >> [    4.253091] FS-Cache: Duplicate cookie detected
> > >> [    4.253120] FS-Cache: O-cookie c=000000004cc29d26 [p=0000000017dbbbc0 fl=222 nc=0 na=1]
> > >> [    4.253153] FS-Cache: O-cookie d=00000000e8e68765 n=0000000012869fa7
> > >> [    4.253179] FS-Cache: O-key=[8] '020001bd0a010102'
> > >> [    4.253201] FS-Cache: N-cookie c=00000000c3bbbddd [p=0000000017dbbbc0 fl=2 nc=0 na=1]
> > >> [    4.253235] FS-Cache: N-cookie d=00000000e8e68765 n=00000000335882b3
> > >> [    4.253262] FS-Cache: N-key=[8] '020001bd0a010102'
> > >> [    4.254107] FS-Cache: Duplicate cookie detected
> > >> [    4.254130] FS-Cache: O-cookie c=000000004cc29d26 [p=0000000017dbbbc0 fl=222 nc=0 na=1]
> > >> [    4.254161] FS-Cache: O-cookie d=00000000e8e68765 n=0000000012869fa7
> > >> [    4.254185] FS-Cache: O-key=[8] '020001bd0a010102'
> > >> [    4.254230] FS-Cache: N-cookie c=000000000ec2f0bb [p=0000000017dbbbc0 fl=2 nc=0 na=1]
> > >> [    4.254261] FS-Cache: N-cookie d=00000000e8e68765 n=0000000024706210
> > >> [    4.254285] FS-Cache: N-key=[8] '020001bd0a010102'
> > >> [    4.329147] CIFS VFS: BAD_NETWORK_NAME: \\stor02.bk.prodrive.nl\userdata$
> > >> [    4.330107] CIFS VFS: cifs_mount failed w/return code = -2
> > >> [65206.127542] CIFS VFS: Send error in SessSetup = -126
> > >> [151597.808064] CIFS VFS: Send error in SessSetup = -126
> > >> [237989.956447] CIFS VFS: Send error in SessSetup = -126
> > >> [324380.937984] CIFS VFS: Send error in SessSetup = -126
> > >> [402750.869518] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
> > >> [402750.869594] BUG: unable to handle kernel paging request at ffff909bf61609d0
> > >> [402750.869650] PGD 1ac02067 P4D 1ac02067 PUD 1385f5063 PMD 136142063 PTE 8000000136160063
> > >> [402750.869716] Oops: 0011 [#1] PREEMPT SMP PTI
> > >> [402750.869753] CPU: 0 PID: 797 Comm: cifsd Tainted: G            E     4.20.17-pd-4.20.y #20190611
> > >> [402750.869818] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/19/2018
> > >> [402750.869926] CIFS VFS: No task to wake, unknown frame received! NumMids 3
> > >> [402750.869947] RIP: 0010:0xffff909bf61609d0
> > >> [402750.870013] 00000000: 424d53fe 00010040 00000000 00000005  .SMB@...........
> > >> [402750.870198] Code: ff ff 00 00 00 00 4e 01 00 00 00 7d 00 00 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> 09 16 f6 9b 90 ff ff d0 09 16 f6 9b 90 ff ff c0 40 1f f2 9b 90
> > >> [402750.870275] 00000010: 10000009 00000098 000002b5 00000000  ................
> > >> [402750.870450] RSP: 0000:ffffa18d41adbd28 EFLAGS: 00010202
> > >> [402750.870469] 00000020: 000001cd 00000001 44000cd9 0000c824  ...........D$...
> > >> [402750.870509] RAX: ffff909bf61609d0 RBX: ffff909bf6c3f800 RCX: dead000000000200
> > >> [402750.870511] RDX: ffff909bf74fa980 RSI: 0000000000000246 RDI: ffff909bf6160998
> > >> [402750.870530] 00000030: 79955e4a 0dacb8eb 025edebd 4efa7788  J^.y......^..w.N
> > >> [402750.870582] RBP: ffff909bf74fa980 R08: 0000000000000000 R09: ffff909bf6c3f970
> > >> [402750.870585] R10: ffffa18d406a7cc0 R11: 0000000000000000 R12: ffff909bf6c3f9c0
> > >> [402750.870605] CIFS VFS: No task to wake, unknown frame received! NumMids 3
> > >> [402750.870656] R13: ffff909bf74fa980 R14: ffffa18d41adbd40 R15: ffff909bf74fa100
> > >> [402750.870660] FS:  0000000000000000(0000) GS:ffff909bf9c00000(0000) knlGS:0000000000000000
> > >> [402750.870677] 00000000: 424d53fe 00010040 00000000 00000010  .SMB@...........
> > >> [402750.870731] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >> [402750.870733] CR2: ffff909bf61609d0 CR3: 0000000136e56004 CR4: 00000000003606f0
> > >> [402750.870752] 00000010: 0000000d 00000068 000002b6 00000000  ....h...........
> > >> [402750.870888] 00000020: 000001cd 00000001 44000cd9 0000c824  ...........D$...
> > >> [402750.870894] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > >> [402750.870907] 00000030: f063942a cae9f6e1 d5327c26 91fc6f33  *.c.....&|2.3o..
> > >> [402750.872321] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > >> [402750.872324] Call Trace:
> > >> [402750.872381]  ? cifs_reconnect+0x337/0x880 [cifs]
> > >> [402750.872410]  ? cifs_readv_from_socket+0x211/0x260 [cifs]
> > >> [402750.875687]  ? cifs_read_from_socket+0x4a/0x70 [cifs]
> > >> [402750.876128]  ? _raw_spin_unlock_irqrestore+0x20/0x40
> > >> [402750.876555]  ? try_to_wake_up+0x54/0x540
> > >> [402750.876991]  ? cifs_small_buf_get+0x16/0x20 [cifs]
> > >> [402750.877426]  ? cifs_demultiplex_thread+0xdd/0xbc0 [cifs]
> > >> [402750.877834]  ? finish_task_switch+0x7d/0x290
> > >> [402750.878252]  ? cifs_handle_standard+0x190/0x190 [cifs]
> > >> [402750.878651]  ? kthread+0xf8/0x130
> > >> [402750.879034]  ? kthread_create_worker_on_cpu+0x70/0x70
> > >> [402750.879412]  ? ret_from_fork+0x35/0x40
> > >> [402750.879786] Modules linked in: cpufreq_conservative(E) cpufreq_powersave(E) cpufreq_userspace(E) arc4(E) ecb(E) md4(E) sha512_ssse3(E) sha512_generic(E) cmac(E) hmac(E) nfsv3(E) nfs_acl(E) nls_utf8(E) nfs(E) lockd(E) grace(E) cifs(E) ccm(E) dns_resolver(E) fscache(E) sb_edac(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) aes_x86_64(E) crypto_simd(E) joydev(E) cryptd(E) vmw_balloon(E) evdev(E) glue_helper(E) serio_raw(E) vmwgfx(E) ttm(E) drm_kms_helper(E) sg(E) drm(E) vmw_vsock_vmci_transport(E) vsock(E) vmw_vmci(E) ac(E) button(E) auth_rpcgss(E) sunrpc(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sd_mod(E) ata_generic(E) ata_piix(E) vmw_pvscsi(E) libata(E) crc32c_intel(E) psmouse(E) vmxnet3(E) i2c_piix4(E) scsi_mod(E)
> > >> [402750.883384] CR2: ffff909bf61609d0
> > >> [402750.883783] ---[ end trace 08b06875e82513eb ]---
> > >> [402750.884200] RIP: 0010:0xffff909bf61609d0
> > >> [402750.884598] Code: ff ff 00 00 00 00 52 01 00 00 00 7d 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> 09 16 f6 9b 90 ff ff d0 09 16 f6 9b 90 ff ff c0 40 1f f2 9b 90
> > >> [402750.885822] RSP: 0000:ffffa18d41adbd28 EFLAGS: 00010202
> > >> [402750.886229] RAX: ffff909bf61609d0 RBX: ffff909bf6c3f800 RCX: dead000000000200
> > >> [402750.886651] RDX: ffff909bf74fa980 RSI: 0000000000000246 RDI: ffff909bf6160998
> > >> [402750.887065] RBP: ffff909bf74fa980 R08: 0000000000000000 R09: ffff909bf6c3f970
> > >> [402750.887478] R10: ffffa18d406a7cc0 R11: 0000000000000000 R12: ffff909bf6c3f9c0
> > >> [402750.887883] R13: ffff909bf74fa980 R14: ffffa18d41adbd40 R15: ffff909bf74fa100
> > >> [402750.888285] FS:  0000000000000000(0000) GS:ffff909bf9c00000(0000) knlGS:0000000000000000
> > >> [402750.888692] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >> [402750.889093] CR2: ffff909bf61609d0 CR3: 0000000136e56004 CR4: 00000000003606f0
> > >> [402750.889559] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > >> [402750.889972] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > >> [410772.901264] CIFS VFS: Send error in SessSetup = -126
> > >> [497165.194662] CIFS VFS: Send error in SessSetup = -126
> > >> [583557.492304] CIFS VFS: Send error in SessSetup = -126
> > >> [669948.937016] CIFS VFS: Send error in SessSetup = -126
> > >> [756341.390112] CIFS VFS: Send error in SessSetup = -126
> > >> [842733.557002] CIFS VFS: Send error in SessSetup = -126
> > >> [929125.303428] CIFS VFS: Send error in SessSetup = -126
> > >> [1015516.629380] CIFS VFS: Send error in SessSetup = -126
> > >> [1101908.464697] CIFS VFS: Send error in SessSetup = -126
> > >> [1188300.531261] CIFS VFS: Send error in SessSetup = -126
> > >> [1274692.583049] CIFS VFS: Send error in SessSetup = -126
> > >>
> > >>
> > >>>
> > >>> The individual stack dumps are pretty useful. Here is my theory:
> > >>>
> > >>>> pid: 9505
> > >>>> syscall: 4 0x56550a2ec470 0x7ffede42e9a0 0x7ffede42e9a0 0x83a 0x3 0x20
> > >>>> 0x7ffede42e8f8 0x7f7f8928f295
> > >>>> [<0>] open_shroot+0x43/0x200 [cifs]
> > >>>> [<0>] smb2_query_path_info+0x93/0x220 [cifs]
> > >>>
> > >>> Almost all of the processes have the same stack trace. They are stuck at
> > >>> open_shroot()+0x43 which is probably
> > >>>
> > >>>       mutex_lock(&tcon->crfid.fid_mutex);
> > >>>
> > >>> Then there are only 2 other processes stuck somewhere in the same code path
> > >>> (open_shroot) but deeper, meaning they have the locks that the other
> > >>> processes are waiting for:
> > >>>
> > >>>
> > >>>> pid: 22858
> > >>>> syscall: 4 0x564b46285d10 0x7ffcea3f9a80 0x7ffcea3f9a80 0x83a 0x3 0x20
> > >>>> 0x7ffcea3f99d8 0x7f6cc78c7295
> > >>>> [<0>] cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
> > >>>> [<0>] smb2_reconnect+0x2d6/0x4b0 [cifs]
> > >>>> [<0>] smb2_plain_req_init+0x30/0x240 [cifs]
> > >>>> [<0>] SMB2_open_init+0x6d/0x7c0 [cifs]
> > >>>> [<0>] SMB2_open+0x150/0x520 [cifs]
> > >>>> [<0>] open_shroot+0x12f/0x200 [cifs]
> > >>>> [<0>] smb2_query_path_info+0x93/0x220 [cifs]
> > >>>> [<0>] cifs_get_inode_info+0x580/0xb10 [cifs]
> > >>>> [<0>] cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
> > >>>> [<0>] cifs_getattr+0x5b/0x1b0 [cifs]
> > >>>> [<0>] vfs_statx+0x89/0xe0
> > >>>> [<0>] __do_sys_newstat+0x39/0x70
> > >>>> [<0>] do_syscall_64+0x55/0x100
> > >>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > >>>> [<0>] 0xffffffffffffffff
> > >>>
> > >>>
> > >>>> pid: 20027
> > >>>> syscall: 4 0x55a3c7d767d0 0x7ffe51432ab0 0x7ffe51432ab0 0x83a
> > >>>> 0x55a3c7d75c40 0x20 0x7ffe51432a08 0x7f5f7c4e7295
> > >>>> [<0>] cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
> > >>>> [<0>] smb2_reconnect+0x2d6/0x4b0 [cifs]
> > >>>> [<0>] smb2_plain_req_init+0x30/0x240 [cifs]
> > >>>> [<0>] SMB2_open_init+0x6d/0x7c0 [cifs]
> > >>>> [<0>] SMB2_open+0x150/0x520 [cifs]
> > >>>> [<0>] open_shroot+0x12f/0x200 [cifs]
> > >>>> [<0>] smb2_query_path_info+0x93/0x220 [cifs]
> > >>>> [<0>] cifs_get_inode_info+0x580/0xb10 [cifs]
> > >>>> [<0>] cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
> > >>>> [<0>] cifs_getattr+0x5b/0x1b0 [cifs]
> > >>>> [<0>] vfs_statx+0x89/0xe0
> > >>>> [<0>] __do_sys_newstat+0x39/0x70
> > >>>> [<0>] do_syscall_64+0x55/0x100
> > >>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > >>>> [<0>] 0xffffffffffffffff
> > >>>
> > >>> Due to timeouts maybe the Open request needs
> > >>> to reconnect the server/ses/tcon and to do this it calls
> > >>> cifs_mark_open_files_invalid() and gets stuck somewhere there.
> > >>>
> > >>>           spin_lock(&tcon->open_file_lock);
> > >>>           list_for_each_safe(tmp, tmp1, &tcon->openFileList) {
> > >>>                   open_file = list_entry(tmp, struct cifsFileInfo, tlist);
> > >>>                   open_file->invalidHandle = true;
> > >>>                   open_file->oplock_break_cancelled = true;
> > >>>           }
> > >>>           spin_unlock(&tcon->open_file_lock);
> > >>>
> > >>>           mutex_lock(&tcon->crfid.fid_mutex); <=== most likely here
> > >>>           tcon->crfid.is_valid = false;
> > >>>           memset(tcon->crfid.fid, 0, sizeof(struct cifs_fid));
> > >>>           mutex_unlock(&tcon->crfid.fid_mutex);
> > >>>
> > >>> I think these processes are trying to lock the same lock twice: one in
> > >>> open_shroot() and since Open ends up having to reconnect, once again in
> > >>> mark_open_files_invalid(). I think it's the same lock because I don't
> > >>> see why the tcon pointers would be different in those 2 spots. Kernel
> > >>> mutexes are not reentrant so this is a deadlock.
> > >>
> > >> Is there anything we can do about this? Is this maybe already fixed in newer kernels?
> > >>
> > >> Regards, Martijn
> > >>
> > >>>
> > >>> Cheers,
> > >>>
> > >>
> > >> --
> > >> Martijn de Gouw
> > >> Designer
> > >> Prodrive Technologies
> > >> Mobile: +31 63 17 76 161
> > >> Phone:  +31 40 26 76 200
> >
> > --
> > Martijn de Gouw
> > Designer
> > Prodrive Technologies
> > Mobile: +31 63 17 76 161
> > Phone:  +31 40 26 76 200
>
>
>
> --
> Thanks,
>
> Steve

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Many processes end up in uninterruptible sleep accessing cifs mounts
  2019-07-06 13:05             ` Steve French
  2019-07-06 16:40               ` Pavel Shilovsky
@ 2019-07-08  9:04               ` Martijn de Gouw
  2019-07-08 11:06                 ` Aurélien Aptel
  1 sibling, 1 reply; 16+ messages in thread
From: Martijn de Gouw @ 2019-07-08  9:04 UTC (permalink / raw)
  To: Steve French; +Cc: Pavel Shilovsky, Aurélien Aptel, linux-cifs

Hi

On 06-07-2019 15:05, Steve French wrote:
> Are we sure this isn't something really simple like the server IP
> address changing?  Remember in 5.0 kernel (also updated cifs-utils)

I don't think our file servers ever change ip. Neither do the domain 
controllers. But could this be related to dfs, where the referrals are 
provided in a round-robin like fashion?

Regards, Martijn

> the code to allow failover when server's IP address changes was added.
> 
> On Sat, Jul 6, 2019 at 6:12 AM Martijn de Gouw
> <martijn.de.gouw@prodrive-technologies.com> wrote:
>>
>> Hi Pavel,
>>
>> On 04-07-2019 23:34, Pavel Shilovsky wrote:
>>> Hi Martijn,
>>>
>>> Thanks for reporting the problem. Have you tried v5.1.5 kernel and
>>> above? That one has many reconnect fixes. If yes, could you please
>>> provide the kernel stack / panic traces when running new versions of
>>> the kernel?
>>
>> I can try to run them for a while on a less critical server. But even if that machine runs fine, there is not really a guarantee the cifs issue we have right now is solved for our critical machines, causing me to be very hesitated to upgrade those.
>>
>> Btw, here is another kernel dump, generated today. Please note that on this machine the cifs mounts are not really used, all 'action'
>> performed on these mounts is the check if they are still mounted (by puppet).
>>
>>
>> [36247.367572] INFO: task node_exporter:3567 blocked for more than 120 seconds.
>> [36247.367613]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
>> [36247.367630] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [36247.367648] node_exporter   D    0  3567      1 0x00000000
>> [36247.367650] Call Trace:
>> [36247.367668]  ? __schedule+0x338/0x850
>> [36247.367671]  ? preempt_count_add+0x67/0xb0
>> [36247.367673]  schedule+0x3c/0x90
>> [36247.367675]  schedule_preempt_disabled+0x14/0x20
>> [36247.367676]  __mutex_lock.isra.7+0x19f/0x540
>> [36247.367680]  ? try_module_get+0x68/0x100
>> [36247.367792]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
>> [36247.367803]  smb2_reconnect+0x1d7/0x4b0 [cifs]
>> [36247.367824]  smb2_plain_req_init+0x30/0x240 [cifs]
>> [36247.367826]  ? __switch_to_asm+0x34/0x70
>> [36247.367827]  ? __switch_to_asm+0x40/0x70
>> [36247.367836]  SMB2_open_init+0x6d/0x7c0 [cifs]
>> [36247.367846]  ? smb2_queryfs+0x13d/0x350 [cifs]
>> [36247.367848]  ? lookup_fast+0xc8/0x2e0
>> [36247.367857]  smb2_queryfs+0x13d/0x350 [cifs]
>> [36247.367861]  ? lookup_fast+0xc8/0x2e0
>> [36247.367863]  ? ___cache_free+0x31/0x2e0
>> [36247.367865]  ? terminate_walk+0x93/0x100
>> [36247.367873]  ? cifs_statfs+0xad/0x300 [cifs]
>> [36247.367880]  cifs_statfs+0xad/0x300 [cifs]
>> [36247.367894]  statfs_by_dentry+0x6a/0x90
>> [36247.367895]  vfs_statfs+0x16/0xc0
>> [36247.367897]  user_statfs+0x50/0xa0
>> [36247.367898]  __do_sys_statfs+0x20/0x50
>> [36247.367902]  do_syscall_64+0x55/0x100
>> [36247.367903]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [36247.367905] RIP: 0033:0x4a5c20
>> [36247.367936] Code: Bad RIP value.
>> [36247.367937] RSP: 002b:000000c0003ef490 EFLAGS: 00000206 ORIG_RAX: 0000000000000089
>> [36247.367938] RAX: ffffffffffffffda RBX: 000000c00002a000 RCX: 00000000004a5c20
>> [36247.367939] RDX: 0000000000000000 RSI: 000000c0003ef5c0 RDI: 000000c0004510e0
>> [36247.367940] RBP: 000000c0003ef4f0 R08: 0000000000000000 R09: 0000000000000000
>> [36247.367940] R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff
>> [36247.367941] R13: 0000000000000088 R14: 0000000000000087 R15: 0000000000000100
>> [36247.367954] INFO: task kworker/1:2:32265 blocked for more than 120 seconds.
>> [36247.367973]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
>> [36247.367989] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [36247.368008] kworker/1:2     D    0 32265      2 0x80000000
>> [36247.368020] Workqueue: cifsiod smb2_reconnect_server [cifs]
>> [36247.368021] Call Trace:
>> [36247.368023]  ? __schedule+0x338/0x850
>> [36247.368025]  schedule+0x3c/0x90
>> [36247.368026]  schedule_preempt_disabled+0x14/0x20
>> [36247.368028]  __mutex_lock.isra.7+0x19f/0x540
>> [36247.368037]  ? SMB2_sess_auth_rawntlmssp_authenticate+0x69/0x170 [cifs]
>> [36247.368046]  ? SMB2_sess_setup+0x17f/0x2c0 [cifs]
>> [36247.368054]  ? cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
>> [36247.368055]  ? _raw_spin_unlock+0x16/0x30
>> [36247.368062]  cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
>> [36247.368071]  smb2_reconnect+0x2d6/0x4b0 [cifs]
>> [36247.368073]  ? _raw_spin_unlock+0x16/0x30
>> [36247.368074]  ? preempt_count_add+0x67/0xb0
>> [36247.368083]  smb2_reconnect_server+0x1bb/0x350 [cifs]
>> [36247.368090]  process_one_work+0x188/0x3a0
>> [36247.368092]  worker_thread+0x4c/0x3a0
>> [36247.368094]  ? preempt_count_add+0x67/0xb0
>> [36247.368095]  ? process_one_work+0x3a0/0x3a0
>> [36247.368097]  kthread+0xf8/0x130
>> [36247.368099]  ? kthread_create_worker_on_cpu+0x70/0x70
>> [36247.368101]  ret_from_fork+0x35/0x40
>> [36247.368103] INFO: task puppet:3836 blocked for more than 120 seconds.
>> [36247.368119]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
>> [36247.368135] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [36247.368158] puppet          D    0  3836    541 0x00000000
>> [36247.368159] Call Trace:
>> [36247.368161]  ? __schedule+0x338/0x850
>> [36247.368163]  schedule+0x3c/0x90
>> [36247.368164]  schedule_preempt_disabled+0x14/0x20
>> [36247.368165]  __mutex_lock.isra.7+0x19f/0x540
>> [36247.368166]  ? __switch_to_asm+0x34/0x70
>> [36247.368167]  ? __switch_to_asm+0x40/0x70
>> [36247.368169]  ? try_module_get+0x68/0x100
>> [36247.368178]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
>> [36247.368186]  smb2_reconnect+0x1d7/0x4b0 [cifs]
>> [36247.368188]  ? _raw_spin_unlock_irq+0x1d/0x30
>> [36247.368189]  ? finish_task_switch+0x7d/0x290
>> [36247.368190]  ? __switch_to_asm+0x40/0x70
>> [36247.368198]  smb2_plain_req_init+0x30/0x240 [cifs]
>> [36247.368199]  ? _raw_spin_lock_irqsave+0x25/0x50
>> [36247.368208]  SMB2_open_init+0x6d/0x7c0 [cifs]
>> [36247.368209]  ? _raw_spin_lock+0x13/0x30
>> [36247.368218]  ? cifs_mid_q_entry_release+0x15/0x40 [cifs]
>> [36247.368227]  ? SMB2_open+0x150/0x520 [cifs]
>> [36247.368235]  SMB2_open+0x150/0x520 [cifs]
>> [36247.368238]  ? sched_clock+0x5/0x10
>> [36247.368247]  ? open_shroot+0x12f/0x200 [cifs]
>> [36247.368255]  open_shroot+0x12f/0x200 [cifs]
>> [36247.368258]  ? __follow_mount_rcu.isra.31+0x3c/0xf0
>> [36247.368267]  smb2_query_path_info+0x93/0x220 [cifs]
>> [36247.368269]  ? walk_component+0x48/0x360
>> [36247.368278]  cifs_get_inode_info+0x580/0xb10 [cifs]
>> [36247.368286]  ? filename_lookup+0xf8/0x1a0
>> [36247.368294]  ? build_path_from_dentry_optional_prefix+0x1e9/0x440 [cifs]
>> [36247.368303]  cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
>> [36247.368311]  cifs_getattr+0x5b/0x1b0 [cifs]
>> [36247.368313]  vfs_statx+0x89/0xe0
>> [36247.368315]  __do_sys_newlstat+0x39/0x70
>> [36247.368318]  do_syscall_64+0x55/0x100
>> [36247.368320]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [36247.368321] RIP: 0033:0x7f562d3a0335
>> [36247.368323] Code: Bad RIP value.
>> [36247.368324] RSP: 002b:00007ffc6bbc4398 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
>> [36247.368325] RAX: ffffffffffffffda RBX: 00007ffc6bbc43b0 RCX: 00007f562d3a0335
>> [36247.368326] RDX: 00007ffc6bbc43b0 RSI: 00007ffc6bbc43b0 RDI: 000055e4e5e1bd50
>> [36247.368326] RBP: 000055e4e2eb65e0 R08: 0000000000000002 R09: 000055e4e2eb69e0
>> [36247.368327] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
>> [36247.368328] R13: 000055e4e2ef7258 R14: 000055e4e2ef8810 R15: 000055e4e2f92ad0
>> [36368.186922] INFO: task node_exporter:3567 blocked for more than 120 seconds.
>> [36368.186988]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
>> [36368.187039] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [36368.187097] node_exporter   D    0  3567      1 0x00000000
>> [36368.187102] Call Trace:
>> [36368.187117]  ? __schedule+0x338/0x850
>> [36368.187123]  ? preempt_count_add+0x67/0xb0
>> [36368.187128]  schedule+0x3c/0x90
>> [36368.187133]  schedule_preempt_disabled+0x14/0x20
>> [36368.187138]  __mutex_lock.isra.7+0x19f/0x540
>> [36368.187175]  ? try_module_get+0x68/0x100
>> [36368.187232]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
>> [36368.187265]  smb2_reconnect+0x1d7/0x4b0 [cifs]
>> [36368.187300]  smb2_plain_req_init+0x30/0x240 [cifs]
>> [36368.187304]  ? __switch_to_asm+0x34/0x70
>> [36368.187307]  ? __switch_to_asm+0x40/0x70
>> [36368.187337]  SMB2_open_init+0x6d/0x7c0 [cifs]
>> [36368.187372]  ? smb2_queryfs+0x13d/0x350 [cifs]
>> [36368.187375]  ? lookup_fast+0xc8/0x2e0
>> [36368.187405]  smb2_queryfs+0x13d/0x350 [cifs]
>> [36368.187410]  ? lookup_fast+0xc8/0x2e0
>> [36368.187416]  ? ___cache_free+0x31/0x2e0
>> [36368.187421]  ? terminate_walk+0x93/0x100
>> [36368.187449]  ? cifs_statfs+0xad/0x300 [cifs]
>> [36368.187471]  cifs_statfs+0xad/0x300 [cifs]
>> [36368.187477]  statfs_by_dentry+0x6a/0x90
>> [36368.187482]  vfs_statfs+0x16/0xc0
>> [36368.187486]  user_statfs+0x50/0xa0
>> [36368.187491]  __do_sys_statfs+0x20/0x50
>> [36368.187499]  do_syscall_64+0x55/0x100
>> [36368.187503]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [36368.187507] RIP: 0033:0x4a5c20
>> [36368.187517] Code: Bad RIP value.
>> [36368.187519] RSP: 002b:000000c0003ef490 EFLAGS: 00000206 ORIG_RAX: 0000000000000089
>> [36368.187523] RAX: ffffffffffffffda RBX: 000000c00002a000 RCX: 00000000004a5c20
>> [36368.187526] RDX: 0000000000000000 RSI: 000000c0003ef5c0 RDI: 000000c0004510e0
>> [36368.187528] RBP: 000000c0003ef4f0 R08: 0000000000000000 R09: 0000000000000000
>> [36368.187530] R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff
>> [36368.187532] R13: 0000000000000088 R14: 0000000000000087 R15: 0000000000000100
>> [36368.187572] INFO: task kworker/1:2:32265 blocked for more than 120 seconds.
>> [36368.187627]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
>> [36368.187677] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [36368.187735] kworker/1:2     D    0 32265      2 0x80000000
>> [36368.187771] Workqueue: cifsiod smb2_reconnect_server [cifs]
>> [36368.187773] Call Trace:
>> [36368.187780]  ? __schedule+0x338/0x850
>> [36368.187785]  schedule+0x3c/0x90
>> [36368.187790]  schedule_preempt_disabled+0x14/0x20
>> [36368.187795]  __mutex_lock.isra.7+0x19f/0x540
>> [36368.187826]  ? SMB2_sess_auth_rawntlmssp_authenticate+0x69/0x170 [cifs]
>> [36368.187855]  ? SMB2_sess_setup+0x17f/0x2c0 [cifs]
>> [36368.187880]  ? cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
>> [36368.187885]  ? _raw_spin_unlock+0x16/0x30
>> [36368.187908]  cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
>> [36368.187936]  smb2_reconnect+0x2d6/0x4b0 [cifs]
>> [36368.187942]  ? _raw_spin_unlock+0x16/0x30
>> [36368.187947]  ? preempt_count_add+0x67/0xb0
>> [36368.187976]  smb2_reconnect_server+0x1bb/0x350 [cifs]
>> [36368.187984]  process_one_work+0x188/0x3a0
>> [36368.187990]  worker_thread+0x4c/0x3a0
>> [36368.187994]  ? preempt_count_add+0x67/0xb0
>> [36368.187998]  ? process_one_work+0x3a0/0x3a0
>> [36368.188001]  kthread+0xf8/0x130
>> [36368.188005]  ? kthread_create_worker_on_cpu+0x70/0x70
>> [36368.188009]  ret_from_fork+0x35/0x40
>> [36368.188014] INFO: task puppet:3836 blocked for more than 120 seconds.
>> [36368.188064]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
>> [36368.188114] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [36368.188182] puppet          D    0  3836    541 0x00000000
>> [36368.188186] Call Trace:
>> [36368.188191]  ? __schedule+0x338/0x850
>> [36368.188196]  schedule+0x3c/0x90
>> [36368.188200]  schedule_preempt_disabled+0x14/0x20
>> [36368.188205]  __mutex_lock.isra.7+0x19f/0x540
>> [36368.188208]  ? __switch_to_asm+0x34/0x70
>> [36368.188211]  ? __switch_to_asm+0x40/0x70
>> [36368.188215]  ? try_module_get+0x68/0x100
>> [36368.188244]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
>> [36368.188271]  smb2_reconnect+0x1d7/0x4b0 [cifs]
>> [36368.188277]  ? _raw_spin_unlock_irq+0x1d/0x30
>> [36368.188280]  ? finish_task_switch+0x7d/0x290
>> [36368.188283]  ? __switch_to_asm+0x40/0x70
>> [36368.188310]  smb2_plain_req_init+0x30/0x240 [cifs]
>> [36368.188314]  ? _raw_spin_lock_irqsave+0x25/0x50
>> [36368.188342]  SMB2_open_init+0x6d/0x7c0 [cifs]
>> [36368.188346]  ? _raw_spin_lock+0x13/0x30
>> [36368.188376]  ? cifs_mid_q_entry_release+0x15/0x40 [cifs]
>> [36368.188405]  ? SMB2_open+0x150/0x520 [cifs]
>> [36368.188432]  SMB2_open+0x150/0x520 [cifs]
>> [36368.188440]  ? sched_clock+0x5/0x10
>> [36368.188470]  ? open_shroot+0x12f/0x200 [cifs]
>> [36368.188497]  open_shroot+0x12f/0x200 [cifs]
>> [36368.188503]  ? __follow_mount_rcu.isra.31+0x3c/0xf0
>> [36368.188534]  smb2_query_path_info+0x93/0x220 [cifs]
>> [36368.188539]  ? walk_component+0x48/0x360
>> [36368.188569]  cifs_get_inode_info+0x580/0xb10 [cifs]
>> [36368.188574]  ? filename_lookup+0xf8/0x1a0
>> [36368.188601]  ? build_path_from_dentry_optional_prefix+0x1e9/0x440 [cifs]
>> [36368.188630]  cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
>> [36368.188659]  cifs_getattr+0x5b/0x1b0 [cifs]
>> [36368.188664]  vfs_statx+0x89/0xe0
>> [36368.188670]  __do_sys_newlstat+0x39/0x70
>> [36368.188677]  do_syscall_64+0x55/0x100
>> [36368.188681]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [36368.188684] RIP: 0033:0x7f562d3a0335
>> [36368.188689] Code: Bad RIP value.
>> [36368.188691] RSP: 002b:00007ffc6bbc4398 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
>> [36368.188695] RAX: ffffffffffffffda RBX: 00007ffc6bbc43b0 RCX: 00007f562d3a0335
>> [36368.188696] RDX: 00007ffc6bbc43b0 RSI: 00007ffc6bbc43b0 RDI: 000055e4e5e1bd50
>> [36368.188698] RBP: 000055e4e2eb65e0 R08: 0000000000000002 R09: 000055e4e2eb69e0
>> [36368.188700] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
>> [36368.188702] R13: 000055e4e2ef7258 R14: 000055e4e2ef8810 R15: 000055e4e2f92ad0
>> [36489.006262] INFO: task node_exporter:3567 blocked for more than 120 seconds.
>> [36489.006300]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
>> [36489.006329] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [36489.006458] node_exporter   D    0  3567      1 0x00000000
>> [36489.006461] Call Trace:
>> [36489.006470]  ? __schedule+0x338/0x850
>> [36489.006474]  ? preempt_count_add+0x67/0xb0
>> [36489.006476]  schedule+0x3c/0x90
>> [36489.006478]  schedule_preempt_disabled+0x14/0x20
>> [36489.006480]  __mutex_lock.isra.7+0x19f/0x540
>> [36489.006485]  ? try_module_get+0x68/0x100
>> [36489.006515]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
>> [36489.006531]  smb2_reconnect+0x1d7/0x4b0 [cifs]
>> [36489.006547]  smb2_plain_req_init+0x30/0x240 [cifs]
>> [36489.006549]  ? __switch_to_asm+0x34/0x70
>> [36489.006550]  ? __switch_to_asm+0x40/0x70
>> [36489.006565]  SMB2_open_init+0x6d/0x7c0 [cifs]
>> [36489.006581]  ? smb2_queryfs+0x13d/0x350 [cifs]
>> [36489.006583]  ? lookup_fast+0xc8/0x2e0
>> [36489.006597]  smb2_queryfs+0x13d/0x350 [cifs]
>> [36489.006599]  ? lookup_fast+0xc8/0x2e0
>> [36489.006602]  ? ___cache_free+0x31/0x2e0
>> [36489.006604]  ? terminate_walk+0x93/0x100
>> [36489.006619]  ? cifs_statfs+0xad/0x300 [cifs]
>> [36489.006632]  cifs_statfs+0xad/0x300 [cifs]
>> [36489.006635]  statfs_by_dentry+0x6a/0x90
>> [36489.006637]  vfs_statfs+0x16/0xc0
>> [36489.006639]  user_statfs+0x50/0xa0
>> [36489.006641]  __do_sys_statfs+0x20/0x50
>> [36489.006645]  do_syscall_64+0x55/0x100
>> [36489.006647]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [36489.006649] RIP: 0033:0x4a5c20
>> [36489.006654] Code: Bad RIP value.
>> [36489.006655] RSP: 002b:000000c0003ef490 EFLAGS: 00000206 ORIG_RAX: 0000000000000089
>> [36489.006657] RAX: ffffffffffffffda RBX: 000000c00002a000 RCX: 00000000004a5c20
>> [36489.006658] RDX: 0000000000000000 RSI: 000000c0003ef5c0 RDI: 000000c0004510e0
>> [36489.006659] RBP: 000000c0003ef4f0 R08: 0000000000000000 R09: 0000000000000000
>> [36489.006660] R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff
>> [36489.006661] R13: 0000000000000088 R14: 0000000000000087 R15: 0000000000000100
>> [36489.006675] INFO: task kworker/1:2:32265 blocked for more than 120 seconds.
>> [36489.006812]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
>> [36489.006940] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [36489.007076] kworker/1:2     D    0 32265      2 0x80000000
>> [36489.007100] Workqueue: cifsiod smb2_reconnect_server [cifs]
>> [36489.007101] Call Trace:
>> [36489.007105]  ? __schedule+0x338/0x850
>> [36489.007107]  schedule+0x3c/0x90
>> [36489.007109]  schedule_preempt_disabled+0x14/0x20
>> [36489.007111]  __mutex_lock.isra.7+0x19f/0x540
>> [36489.007124]  ? SMB2_sess_auth_rawntlmssp_authenticate+0x69/0x170 [cifs]
>> [36489.007134]  ? SMB2_sess_setup+0x17f/0x2c0 [cifs]
>> [36489.007141]  ? cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
>> [36489.007143]  ? _raw_spin_unlock+0x16/0x30
>> [36489.007150]  cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
>> [36489.007159]  smb2_reconnect+0x2d6/0x4b0 [cifs]
>> [36489.007161]  ? _raw_spin_unlock+0x16/0x30
>> [36489.007163]  ? preempt_count_add+0x67/0xb0
>> [36489.007172]  smb2_reconnect_server+0x1bb/0x350 [cifs]
>> [36489.007176]  process_one_work+0x188/0x3a0
>> [36489.007178]  worker_thread+0x4c/0x3a0
>> [36489.007179]  ? preempt_count_add+0x67/0xb0
>> [36489.007180]  ? process_one_work+0x3a0/0x3a0
>> [36489.007181]  kthread+0xf8/0x130
>> [36489.007183]  ? kthread_create_worker_on_cpu+0x70/0x70
>> [36489.007184]  ret_from_fork+0x35/0x40
>> [36489.007186] INFO: task puppet:3836 blocked for more than 120 seconds.
>> [36489.007294]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
>> [36489.007409] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [36489.007540] puppet          D    0  3836    541 0x00000000
>> [36489.007541] Call Trace:
>> [36489.007545]  ? __schedule+0x338/0x850
>> [36489.007548]  schedule+0x3c/0x90
>> [36489.007550]  schedule_preempt_disabled+0x14/0x20
>> [36489.007552]  __mutex_lock.isra.7+0x19f/0x540
>> [36489.007553]  ? __switch_to_asm+0x34/0x70
>> [36489.007555]  ? __switch_to_asm+0x40/0x70
>> [36489.007558]  ? try_module_get+0x68/0x100
>> [36489.007580]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
>> [36489.007594]  smb2_reconnect+0x1d7/0x4b0 [cifs]
>> [36489.007596]  ? _raw_spin_unlock_irq+0x1d/0x30
>> [36489.007598]  ? finish_task_switch+0x7d/0x290
>> [36489.007600]  ? __switch_to_asm+0x40/0x70
>> [36489.007613]  smb2_plain_req_init+0x30/0x240 [cifs]
>> [36489.007615]  ? _raw_spin_lock_irqsave+0x25/0x50
>> [36489.007628]  SMB2_open_init+0x6d/0x7c0 [cifs]
>> [36489.007630]  ? _raw_spin_lock+0x13/0x30
>> [36489.007644]  ? cifs_mid_q_entry_release+0x15/0x40 [cifs]
>> [36489.007658]  ? SMB2_open+0x150/0x520 [cifs]
>> [36489.007672]  SMB2_open+0x150/0x520 [cifs]
>> [36489.007677]  ? sched_clock+0x5/0x10
>> [36489.007692]  ? open_shroot+0x12f/0x200 [cifs]
>> [36489.007707]  open_shroot+0x12f/0x200 [cifs]
>> [36489.007711]  ? __follow_mount_rcu.isra.31+0x3c/0xf0
>> [36489.007727]  smb2_query_path_info+0x93/0x220 [cifs]
>> [36489.007729]  ? walk_component+0x48/0x360
>> [36489.007745]  cifs_get_inode_info+0x580/0xb10 [cifs]
>> [36489.007748]  ? filename_lookup+0xf8/0x1a0
>> [36489.007762]  ? build_path_from_dentry_optional_prefix+0x1e9/0x440 [cifs]
>> [36489.007778]  cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
>> [36489.007791]  cifs_getattr+0x5b/0x1b0 [cifs]
>> [36489.007794]  vfs_statx+0x89/0xe0
>> [36489.007797]  __do_sys_newlstat+0x39/0x70
>> [36489.007801]  do_syscall_64+0x55/0x100
>> [36489.007802]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [36489.007805] RIP: 0033:0x7f562d3a0335
>> [36489.007808] Code: Bad RIP value.
>> [36489.007809] RSP: 002b:00007ffc6bbc4398 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
>> [36489.007811] RAX: ffffffffffffffda RBX: 00007ffc6bbc43b0 RCX: 00007f562d3a0335
>> [36489.007812] RDX: 00007ffc6bbc43b0 RSI: 00007ffc6bbc43b0 RDI: 000055e4e5e1bd50
>> [36489.007813] RBP: 000055e4e2eb65e0 R08: 0000000000000002 R09: 000055e4e2eb69e0
>> [36489.007814] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
>> [36489.007815] R13: 000055e4e2ef7258 R14: 000055e4e2ef8810 R15: 000055e4e2f92ad0
>> [36609.825705] INFO: task node_exporter:3567 blocked for more than 120 seconds.
>> [36609.825836]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
>> [36609.825869] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [36609.825908] node_exporter   D    0  3567      1 0x00000000
>> [36609.825910] Call Trace:
>> [36609.825919]  ? __schedule+0x338/0x850
>> [36609.825922]  ? preempt_count_add+0x67/0xb0
>> [36609.825925]  schedule+0x3c/0x90
>> [36609.825927]  schedule_preempt_disabled+0x14/0x20
>> [36609.825929]  __mutex_lock.isra.7+0x19f/0x540
>> [36609.825933]  ? try_module_get+0x68/0x100
>> [36609.825960]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
>> [36609.825979]  smb2_reconnect+0x1d7/0x4b0 [cifs]
>> [36609.825999]  smb2_plain_req_init+0x30/0x240 [cifs]
>> [36609.826001]  ? __switch_to_asm+0x34/0x70
>> [36609.826002]  ? __switch_to_asm+0x40/0x70
>> [36609.826020]  SMB2_open_init+0x6d/0x7c0 [cifs]
>> [36609.826039]  ? smb2_queryfs+0x13d/0x350 [cifs]
>> [36609.826040]  ? lookup_fast+0xc8/0x2e0
>> [36609.826057]  smb2_queryfs+0x13d/0x350 [cifs]
>> [36609.826060]  ? lookup_fast+0xc8/0x2e0
>> [36609.826064]  ? ___cache_free+0x31/0x2e0
>> [36609.826066]  ? terminate_walk+0x93/0x100
>> [36609.826084]  ? cifs_statfs+0xad/0x300 [cifs]
>> [36609.826099]  cifs_statfs+0xad/0x300 [cifs]
>> [36609.826103]  statfs_by_dentry+0x6a/0x90
>> [36609.826104]  vfs_statfs+0x16/0xc0
>> [36609.826106]  user_statfs+0x50/0xa0
>> [36609.826108]  __do_sys_statfs+0x20/0x50
>> [36609.826113]  do_syscall_64+0x55/0x100
>> [36609.826115]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [36609.826117] RIP: 0033:0x4a5c20
>> [36609.826123] Code: Bad RIP value.
>> [36609.826124] RSP: 002b:000000c0003ef490 EFLAGS: 00000206 ORIG_RAX: 0000000000000089
>> [36609.826126] RAX: ffffffffffffffda RBX: 000000c00002a000 RCX: 00000000004a5c20
>> [36609.826127] RDX: 0000000000000000 RSI: 000000c0003ef5c0 RDI: 000000c0004510e0
>> [36609.826128] RBP: 000000c0003ef4f0 R08: 0000000000000000 R09: 0000000000000000
>> [36609.826129] R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff
>> [36609.826130] R13: 0000000000000088 R14: 0000000000000087 R15: 0000000000000100
>> [73117.654990] CIFS VFS: Send error in SessSetup = -126
>>
>> Regards, Martijn
>>
>>>
>>> --
>>> Best regards,
>>> Pavel Shilovsky
>>>
>>> чт, 4 июл. 2019 г. в 06:36, Martijn de Gouw
>>> <martijn.de.gouw@prodrive-technologies.com>:
>>>>
>>>> Hi,
>>>>
>>>> On 04-07-2019 13:22, Aurélien Aptel wrote:
>>>>> Martijn de Gouw <martijn.de.gouw@prodrive-technologies.com> writes:
>>>>>>> Are there any kernel oops/panic with stack traces and register dumps in
>>>>>>> the log?
>>>>>>>
>>>>>>> You can inspect the kernel stack trace of the hung processes (to see where
>>>>>>> they are stuck) by printing the file /proc/<pid>/stack.
>>>>>>
>>>>>> These are the stacks of all processes that are D, most of them being df.
>>>>>> I also attached the cifs Stats output below.
>>>>>
>>>>> Ok thanks. What about Oops or BUG or panic in dmesg logs, did you see
>>>>> any?
>>>>
>>>> I did find the following messages in the dmesg of one of the servers:
>>>>
>>>> [    4.797893] FS-Cache: Duplicate cookie detected
>>>> [    4.797915] FS-Cache: O-cookie c=000000001a791554 [p=00000000fb6f31ee fl=222 nc=0 na=1]
>>>> [    4.797934] FS-Cache: O-cookie d=0000000020a06fab n=00000000654600e7
>>>> [    4.797949] FS-Cache: O-key=[8] '020001bd0a010102'
>>>> [    4.797963] FS-Cache: N-cookie c=000000005d0bf4a5 [p=00000000fb6f31ee fl=2 nc=0 na=1]
>>>> [    4.797982] FS-Cache: N-cookie d=0000000020a06fab n=000000004e1e47aa
>>>> [    4.797997] FS-Cache: N-key=[8] '020001bd0a010102'
>>>> [    4.798643] FS-Cache: Duplicate cookie detected
>>>> [    4.798659] FS-Cache: O-cookie c=000000001a791554 [p=00000000fb6f31ee fl=222 nc=0 na=1]
>>>> [    4.798679] FS-Cache: O-cookie d=0000000020a06fab n=00000000654600e7
>>>> [    4.798695] FS-Cache: O-key=[8] '020001bd0a010102'
>>>> [    4.798714] FS-Cache: N-cookie c=00000000cbe44971 [p=00000000fb6f31ee fl=2 nc=0 na=1]
>>>> [    4.798733] FS-Cache: N-cookie d=0000000020a06fab n=00000000ab0e78a6
>>>> [    4.798747] FS-Cache: N-key=[8] '020001bd0a010102'
>>>> [    4.906667] FS-Cache: Netfs 'nfs' registered for caching
>>>> [12738.729173] CIFS VFS: Send error in SessSetup = -126
>>>> [99125.480751] CIFS VFS: Send error in SessSetup = -126
>>>> [185517.295175] CIFS VFS: Send error in SessSetup = -126
>>>> [250515.749714] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
>>>> [250515.749740] BUG: unable to handle kernel paging request at ffff8ec52a6fe1d0
>>>> [250515.749757] PGD 1b2602067 P4D 1b2602067 PUD 42dbff063 PMD 42a357063 PTE 800000042a6fe063
>>>> [250515.749779] Oops: 0011 [#1] PREEMPT SMP PTI
>>>> [250515.749792] CPU: 1 PID: 796 Comm: cifsd Tainted: G            E     4.20.17-pd-4.20.y #20190611
>>>> [250515.749812] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/19/2018
>>>> [250515.749844] RIP: 0010:0xffff8ec52a6fe1d0
>>>> [250515.749860] Code: ff ff 00 00 00 00 fd 01 00 00 00 7d 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> e1 6f 2a c5 8e ff ff d0 e1 6f 2a c5 8e ff ff 80 70 82 2b c5 8e
>>>> [250515.749914] RSP: 0018:ffff9e60c2e2bdc0 EFLAGS: 00010202
>>>> [250515.749927] RAX: ffff8ec52a6fe1d0 RBX: ffff8ec526fce800 RCX: dead000000000200
>>>> [250515.749944] RDX: ffff8ec52d3e6800 RSI: 0000000000000246 RDI: ffff8ec52a6fe198
>>>> [250515.749962] RBP: ffff8ec52d3e6800 R08: 0000000000000002 R09: ffff8ec52a6fe170
>>>> [250515.749979] R10: ffff9e60c2e2bc10 R11: ffff8ec527088000 R12: ffff8ec526fce9c0
>>>> [250515.749997] R13: ffff8ec52d3e6800 R14: ffff9e60c2e2bdd8 R15: ffff8ec52d3e6b80
>>>> [250515.750014] FS:  0000000000000000(0000) GS:ffff8ec52fa80000(0000) knlGS:0000000000000000
>>>> [250515.750033] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [250515.750048] CR2: ffff8ec52a6fe1d0 CR3: 000000042a53c005 CR4: 00000000003606e0
>>>> [250515.750100] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> [250515.750102] CIFS VFS: No task to wake, unknown frame received! NumMids 3
>>>> [250515.750119] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>> [250515.750120] Call Trace:
>>>> [250515.750149] 00000000: 424d53fe 00010040 00000000 00000005  .SMB@...........
>>>> [250515.750193]  ? cifs_reconnect+0x337/0x880 [cifs]
>>>> [250515.750201]  ? cifs_handle_standard+0x169/0x190 [cifs]
>>>> [250515.750216] 00000010: 10000009 00000098 00000db2 00000000  ................
>>>> [250515.750234]  ? cifs_demultiplex_thread+0x9e5/0xbc0 [cifs]
>>>> [250515.750240] 00000020: 0000025c 00000001 640002e9 0000c81d  \..........d....
>>>> [250515.750259]  ? finish_task_switch+0x7d/0x290
>>>> [250515.750271] 00000030: 5a9e63f3 6204a9b0 587058e3 0d45419b  .c.Z...b.XpX.AE.
>>>> [250515.750295]  ? cifs_handle_standard+0x190/0x190 [cifs]
>>>> [250515.750728] CIFS VFS: No task to wake, unknown frame received! NumMids 3
>>>> [250515.751200]  ? kthread+0xf8/0x130
>>>> [250515.751639] 00000000: 424d53fe 00010040 00000000 00000010  .SMB@...........
>>>> [250515.752102]  ? kthread_create_worker_on_cpu+0x70/0x70
>>>> [250515.752554] 00000010: 0000000d 00000068 00000db3 00000000  ....h...........
>>>> [250515.752997]  ? ret_from_fork+0x35/0x40
>>>> [250515.753429] 00000020: 0000025c 00000001 640002e9 0000c81d  \..........d....
>>>> [250515.753874] Modules linked in: cpufreq_userspace(E) cpufreq_powersave(E) cpufreq_conservative(E) arc4(E) ecb(E) md4(E) nfsv3(E) nfs_acl(E) nfs(E) sha512_ssse3(E) sha512_generic(E) lockd(E) cmac(E) grace(E) hmac(E) nls_utf8(E) cifs(E) ccm(E) dns_resolver(E) fscache(E) vmw_vsock_vmci_transport(E) vsock(E) sb_edac(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) vmw_balloon(E) aes_x86_64(E) joydev(E) evdev(E) crypto_simd(E) vmwgfx(E) serio_raw(E) cryptd(E) glue_helper(E) ttm(E) sg(E) vmw_vmci(E) drm_kms_helper(E) drm(E) button(E) ac(E) auth_rpcgss(E) sunrpc(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) ata_generic(E) sd_mod(E) ata_piix(E) crc32c_intel(E) psmouse(E) libata(E) vmw_pvscsi(E) vmxnet3(E) scsi_mod(E) i2c_piix4(E)
>>>> [250515.754336] 00000030: 49d4fd21 858665a2 fde5288f 01d2d919  !..I.e...(......
>>>> [250515.754766] CR2: ffff8ec52a6fe1d0
>>>> [250515.758927] CIFS VFS: No task to wake, unknown frame received! NumMids 3
>>>> [250515.759389] ---[ end trace 92ea62cd080150de ]---
>>>> [250515.759879] 00000000: 424d53fe 00010040 00000000 00030006  .SMB@...........
>>>> [250515.760357] RIP: 0010:0xffff8ec52a6fe1d0
>>>> [250515.761841] Code: ff ff 00 00 00 00 fd 01 00 00 00 7d 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> e1 6f 2a c5 8e ff ff d0 e1 6f 2a c5 8e ff ff 80 70 82 2b c5 8e
>>>> [250515.763287] RSP: 0018:ffff9e60c2e2bdc0 EFLAGS: 00010202
>>>> [250515.763778] RAX: ffff8ec52a6fe1d0 RBX: ffff8ec526fce800 RCX: dead000000000200
>>>> [250515.764060] 00000010: 0000000d 00000000 00000db4 00000000  ................
>>>> [250515.764310] RDX: ffff8ec52d3e6800 RSI: 0000000000000246 RDI: ffff8ec52a6fe198
>>>> [250515.764311] RBP: ffff8ec52d3e6800 R08: 0000000000000002 R09: ffff8ec52a6fe170
>>>> [250515.764312] R10: ffff9e60c2e2bc10 R11: ffff8ec527088000 R12: ffff8ec526fce9c0
>>>> [250515.764314] R13: ffff8ec52d3e6800 R14: ffff9e60c2e2bdd8 R15: ffff8ec52d3e6b80
>>>> [250515.765491] 00000020: 0000025c 00000001 640002e9 0000c81d  \..........d....
>>>> [250515.765836] FS:  0000000000000000(0000) GS:ffff8ec52fa80000(0000) knlGS:0000000000000000
>>>> [250515.767899] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [250515.768367] CR2: ffff8ec52a6fe1d0 CR3: 000000042a53c005 CR4: 00000000003606e0
>>>> [250515.768919] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> [250515.769419] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>> [250515.769591] 00000030: 879543e1 572c0574 e53861ba 97ccc40b  .C..t.,W.a8.....
>>>> [271909.743603] CIFS VFS: Send error in SessSetup = -126
>>>> [358301.908585] CIFS VFS: Send error in SessSetup = -126
>>>> [444693.566453] CIFS VFS: Send error in SessSetup = -126
>>>> [531086.090040] CIFS VFS: Send error in SessSetup = -126
>>>> [617477.785390] CIFS VFS: Send error in SessSetup = -126
>>>> [703869.556705] CIFS VFS: Send error in SessSetup = -126
>>>> [790261.656853] CIFS VFS: Send error in SessSetup = -126
>>>> [876653.496928] CIFS VFS: Send error in SessSetup = -126
>>>> [963045.816742] CIFS VFS: Send error in SessSetup = -126
>>>> [1049437.566219] CIFS VFS: Send error in SessSetup = -126
>>>>
>>>>
>>>> And from another server:
>>>> [    4.253091] FS-Cache: Duplicate cookie detected
>>>> [    4.253120] FS-Cache: O-cookie c=000000004cc29d26 [p=0000000017dbbbc0 fl=222 nc=0 na=1]
>>>> [    4.253153] FS-Cache: O-cookie d=00000000e8e68765 n=0000000012869fa7
>>>> [    4.253179] FS-Cache: O-key=[8] '020001bd0a010102'
>>>> [    4.253201] FS-Cache: N-cookie c=00000000c3bbbddd [p=0000000017dbbbc0 fl=2 nc=0 na=1]
>>>> [    4.253235] FS-Cache: N-cookie d=00000000e8e68765 n=00000000335882b3
>>>> [    4.253262] FS-Cache: N-key=[8] '020001bd0a010102'
>>>> [    4.254107] FS-Cache: Duplicate cookie detected
>>>> [    4.254130] FS-Cache: O-cookie c=000000004cc29d26 [p=0000000017dbbbc0 fl=222 nc=0 na=1]
>>>> [    4.254161] FS-Cache: O-cookie d=00000000e8e68765 n=0000000012869fa7
>>>> [    4.254185] FS-Cache: O-key=[8] '020001bd0a010102'
>>>> [    4.254230] FS-Cache: N-cookie c=000000000ec2f0bb [p=0000000017dbbbc0 fl=2 nc=0 na=1]
>>>> [    4.254261] FS-Cache: N-cookie d=00000000e8e68765 n=0000000024706210
>>>> [    4.254285] FS-Cache: N-key=[8] '020001bd0a010102'
>>>> [    4.329147] CIFS VFS: BAD_NETWORK_NAME: \\stor02.bk.prodrive.nl\userdata$
>>>> [    4.330107] CIFS VFS: cifs_mount failed w/return code = -2
>>>> [65206.127542] CIFS VFS: Send error in SessSetup = -126
>>>> [151597.808064] CIFS VFS: Send error in SessSetup = -126
>>>> [237989.956447] CIFS VFS: Send error in SessSetup = -126
>>>> [324380.937984] CIFS VFS: Send error in SessSetup = -126
>>>> [402750.869518] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
>>>> [402750.869594] BUG: unable to handle kernel paging request at ffff909bf61609d0
>>>> [402750.869650] PGD 1ac02067 P4D 1ac02067 PUD 1385f5063 PMD 136142063 PTE 8000000136160063
>>>> [402750.869716] Oops: 0011 [#1] PREEMPT SMP PTI
>>>> [402750.869753] CPU: 0 PID: 797 Comm: cifsd Tainted: G            E     4.20.17-pd-4.20.y #20190611
>>>> [402750.869818] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/19/2018
>>>> [402750.869926] CIFS VFS: No task to wake, unknown frame received! NumMids 3
>>>> [402750.869947] RIP: 0010:0xffff909bf61609d0
>>>> [402750.870013] 00000000: 424d53fe 00010040 00000000 00000005  .SMB@...........
>>>> [402750.870198] Code: ff ff 00 00 00 00 4e 01 00 00 00 7d 00 00 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> 09 16 f6 9b 90 ff ff d0 09 16 f6 9b 90 ff ff c0 40 1f f2 9b 90
>>>> [402750.870275] 00000010: 10000009 00000098 000002b5 00000000  ................
>>>> [402750.870450] RSP: 0000:ffffa18d41adbd28 EFLAGS: 00010202
>>>> [402750.870469] 00000020: 000001cd 00000001 44000cd9 0000c824  ...........D$...
>>>> [402750.870509] RAX: ffff909bf61609d0 RBX: ffff909bf6c3f800 RCX: dead000000000200
>>>> [402750.870511] RDX: ffff909bf74fa980 RSI: 0000000000000246 RDI: ffff909bf6160998
>>>> [402750.870530] 00000030: 79955e4a 0dacb8eb 025edebd 4efa7788  J^.y......^..w.N
>>>> [402750.870582] RBP: ffff909bf74fa980 R08: 0000000000000000 R09: ffff909bf6c3f970
>>>> [402750.870585] R10: ffffa18d406a7cc0 R11: 0000000000000000 R12: ffff909bf6c3f9c0
>>>> [402750.870605] CIFS VFS: No task to wake, unknown frame received! NumMids 3
>>>> [402750.870656] R13: ffff909bf74fa980 R14: ffffa18d41adbd40 R15: ffff909bf74fa100
>>>> [402750.870660] FS:  0000000000000000(0000) GS:ffff909bf9c00000(0000) knlGS:0000000000000000
>>>> [402750.870677] 00000000: 424d53fe 00010040 00000000 00000010  .SMB@...........
>>>> [402750.870731] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [402750.870733] CR2: ffff909bf61609d0 CR3: 0000000136e56004 CR4: 00000000003606f0
>>>> [402750.870752] 00000010: 0000000d 00000068 000002b6 00000000  ....h...........
>>>> [402750.870888] 00000020: 000001cd 00000001 44000cd9 0000c824  ...........D$...
>>>> [402750.870894] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> [402750.870907] 00000030: f063942a cae9f6e1 d5327c26 91fc6f33  *.c.....&|2.3o..
>>>> [402750.872321] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>> [402750.872324] Call Trace:
>>>> [402750.872381]  ? cifs_reconnect+0x337/0x880 [cifs]
>>>> [402750.872410]  ? cifs_readv_from_socket+0x211/0x260 [cifs]
>>>> [402750.875687]  ? cifs_read_from_socket+0x4a/0x70 [cifs]
>>>> [402750.876128]  ? _raw_spin_unlock_irqrestore+0x20/0x40
>>>> [402750.876555]  ? try_to_wake_up+0x54/0x540
>>>> [402750.876991]  ? cifs_small_buf_get+0x16/0x20 [cifs]
>>>> [402750.877426]  ? cifs_demultiplex_thread+0xdd/0xbc0 [cifs]
>>>> [402750.877834]  ? finish_task_switch+0x7d/0x290
>>>> [402750.878252]  ? cifs_handle_standard+0x190/0x190 [cifs]
>>>> [402750.878651]  ? kthread+0xf8/0x130
>>>> [402750.879034]  ? kthread_create_worker_on_cpu+0x70/0x70
>>>> [402750.879412]  ? ret_from_fork+0x35/0x40
>>>> [402750.879786] Modules linked in: cpufreq_conservative(E) cpufreq_powersave(E) cpufreq_userspace(E) arc4(E) ecb(E) md4(E) sha512_ssse3(E) sha512_generic(E) cmac(E) hmac(E) nfsv3(E) nfs_acl(E) nls_utf8(E) nfs(E) lockd(E) grace(E) cifs(E) ccm(E) dns_resolver(E) fscache(E) sb_edac(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) aes_x86_64(E) crypto_simd(E) joydev(E) cryptd(E) vmw_balloon(E) evdev(E) glue_helper(E) serio_raw(E) vmwgfx(E) ttm(E) drm_kms_helper(E) sg(E) drm(E) vmw_vsock_vmci_transport(E) vsock(E) vmw_vmci(E) ac(E) button(E) auth_rpcgss(E) sunrpc(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sd_mod(E) ata_generic(E) ata_piix(E) vmw_pvscsi(E) libata(E) crc32c_intel(E) psmouse(E) vmxnet3(E) i2c_piix4(E) scsi_mod(E)
>>>> [402750.883384] CR2: ffff909bf61609d0
>>>> [402750.883783] ---[ end trace 08b06875e82513eb ]---
>>>> [402750.884200] RIP: 0010:0xffff909bf61609d0
>>>> [402750.884598] Code: ff ff 00 00 00 00 52 01 00 00 00 7d 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> 09 16 f6 9b 90 ff ff d0 09 16 f6 9b 90 ff ff c0 40 1f f2 9b 90
>>>> [402750.885822] RSP: 0000:ffffa18d41adbd28 EFLAGS: 00010202
>>>> [402750.886229] RAX: ffff909bf61609d0 RBX: ffff909bf6c3f800 RCX: dead000000000200
>>>> [402750.886651] RDX: ffff909bf74fa980 RSI: 0000000000000246 RDI: ffff909bf6160998
>>>> [402750.887065] RBP: ffff909bf74fa980 R08: 0000000000000000 R09: ffff909bf6c3f970
>>>> [402750.887478] R10: ffffa18d406a7cc0 R11: 0000000000000000 R12: ffff909bf6c3f9c0
>>>> [402750.887883] R13: ffff909bf74fa980 R14: ffffa18d41adbd40 R15: ffff909bf74fa100
>>>> [402750.888285] FS:  0000000000000000(0000) GS:ffff909bf9c00000(0000) knlGS:0000000000000000
>>>> [402750.888692] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [402750.889093] CR2: ffff909bf61609d0 CR3: 0000000136e56004 CR4: 00000000003606f0
>>>> [402750.889559] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> [402750.889972] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>> [410772.901264] CIFS VFS: Send error in SessSetup = -126
>>>> [497165.194662] CIFS VFS: Send error in SessSetup = -126
>>>> [583557.492304] CIFS VFS: Send error in SessSetup = -126
>>>> [669948.937016] CIFS VFS: Send error in SessSetup = -126
>>>> [756341.390112] CIFS VFS: Send error in SessSetup = -126
>>>> [842733.557002] CIFS VFS: Send error in SessSetup = -126
>>>> [929125.303428] CIFS VFS: Send error in SessSetup = -126
>>>> [1015516.629380] CIFS VFS: Send error in SessSetup = -126
>>>> [1101908.464697] CIFS VFS: Send error in SessSetup = -126
>>>> [1188300.531261] CIFS VFS: Send error in SessSetup = -126
>>>> [1274692.583049] CIFS VFS: Send error in SessSetup = -126
>>>>
>>>>
>>>>>
>>>>> The individual stack dumps are pretty useful. Here is my theory:
>>>>>
>>>>>> pid: 9505
>>>>>> syscall: 4 0x56550a2ec470 0x7ffede42e9a0 0x7ffede42e9a0 0x83a 0x3 0x20
>>>>>> 0x7ffede42e8f8 0x7f7f8928f295
>>>>>> [<0>] open_shroot+0x43/0x200 [cifs]
>>>>>> [<0>] smb2_query_path_info+0x93/0x220 [cifs]
>>>>>
>>>>> Almost all of the processes have the same stack trace. They are stuck at
>>>>> open_shroot()+0x43 which is probably
>>>>>
>>>>>        mutex_lock(&tcon->crfid.fid_mutex);
>>>>>
>>>>> Then there are only 2 other processes stuck somewhere in the same code path
>>>>> (open_shroot) but deeper, meaning they have the locks that the other
>>>>> processes are waiting for:
>>>>>
>>>>>
>>>>>> pid: 22858
>>>>>> syscall: 4 0x564b46285d10 0x7ffcea3f9a80 0x7ffcea3f9a80 0x83a 0x3 0x20
>>>>>> 0x7ffcea3f99d8 0x7f6cc78c7295
>>>>>> [<0>] cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
>>>>>> [<0>] smb2_reconnect+0x2d6/0x4b0 [cifs]
>>>>>> [<0>] smb2_plain_req_init+0x30/0x240 [cifs]
>>>>>> [<0>] SMB2_open_init+0x6d/0x7c0 [cifs]
>>>>>> [<0>] SMB2_open+0x150/0x520 [cifs]
>>>>>> [<0>] open_shroot+0x12f/0x200 [cifs]
>>>>>> [<0>] smb2_query_path_info+0x93/0x220 [cifs]
>>>>>> [<0>] cifs_get_inode_info+0x580/0xb10 [cifs]
>>>>>> [<0>] cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
>>>>>> [<0>] cifs_getattr+0x5b/0x1b0 [cifs]
>>>>>> [<0>] vfs_statx+0x89/0xe0
>>>>>> [<0>] __do_sys_newstat+0x39/0x70
>>>>>> [<0>] do_syscall_64+0x55/0x100
>>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>>> [<0>] 0xffffffffffffffff
>>>>>
>>>>>
>>>>>> pid: 20027
>>>>>> syscall: 4 0x55a3c7d767d0 0x7ffe51432ab0 0x7ffe51432ab0 0x83a
>>>>>> 0x55a3c7d75c40 0x20 0x7ffe51432a08 0x7f5f7c4e7295
>>>>>> [<0>] cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
>>>>>> [<0>] smb2_reconnect+0x2d6/0x4b0 [cifs]
>>>>>> [<0>] smb2_plain_req_init+0x30/0x240 [cifs]
>>>>>> [<0>] SMB2_open_init+0x6d/0x7c0 [cifs]
>>>>>> [<0>] SMB2_open+0x150/0x520 [cifs]
>>>>>> [<0>] open_shroot+0x12f/0x200 [cifs]
>>>>>> [<0>] smb2_query_path_info+0x93/0x220 [cifs]
>>>>>> [<0>] cifs_get_inode_info+0x580/0xb10 [cifs]
>>>>>> [<0>] cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
>>>>>> [<0>] cifs_getattr+0x5b/0x1b0 [cifs]
>>>>>> [<0>] vfs_statx+0x89/0xe0
>>>>>> [<0>] __do_sys_newstat+0x39/0x70
>>>>>> [<0>] do_syscall_64+0x55/0x100
>>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>>> [<0>] 0xffffffffffffffff
>>>>>
>>>>> Due to timeouts maybe the Open request needs
>>>>> to reconnect the server/ses/tcon and to do this it calls
>>>>> cifs_mark_open_files_invalid() and gets stuck somewhere there.
>>>>>
>>>>>            spin_lock(&tcon->open_file_lock);
>>>>>            list_for_each_safe(tmp, tmp1, &tcon->openFileList) {
>>>>>                    open_file = list_entry(tmp, struct cifsFileInfo, tlist);
>>>>>                    open_file->invalidHandle = true;
>>>>>                    open_file->oplock_break_cancelled = true;
>>>>>            }
>>>>>            spin_unlock(&tcon->open_file_lock);
>>>>>
>>>>>            mutex_lock(&tcon->crfid.fid_mutex); <=== most likely here
>>>>>            tcon->crfid.is_valid = false;
>>>>>            memset(tcon->crfid.fid, 0, sizeof(struct cifs_fid));
>>>>>            mutex_unlock(&tcon->crfid.fid_mutex);
>>>>>
>>>>> I think these processes are trying to lock the same lock twice: one in
>>>>> open_shroot() and since Open ends up having to reconnect, once again in
>>>>> mark_open_files_invalid(). I think it's the same lock because I don't
>>>>> see why the tcon pointers would be different in those 2 spots. Kernel
>>>>> mutexes are not reentrant so this is a deadlock.
>>>>
>>>> Is there anything we can do about this? Is this maybe already fixed in newer kernels?
>>>>
>>>> Regards, Martijn
>>>>
>>>>>
>>>>> Cheers,
>>>>>
>>>>
>>>> --
>>>> Martijn de Gouw
>>>> Designer
>>>> Prodrive Technologies
>>>> Mobile: +31 63 17 76 161
>>>> Phone:  +31 40 26 76 200
>>
>> --
>> Martijn de Gouw
>> Designer
>> Prodrive Technologies
>> Mobile: +31 63 17 76 161
>> Phone:  +31 40 26 76 200
> 
> 
> 

-- 
Martijn de Gouw
Designer
Prodrive Technologies
Mobile: +31 63 17 76 161
Phone:  +31 40 26 76 200

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Many processes end up in uninterruptible sleep accessing cifs mounts
  2019-07-08  9:04               ` Martijn de Gouw
@ 2019-07-08 11:06                 ` Aurélien Aptel
  2019-07-08 13:18                   ` Martijn de Gouw
  0 siblings, 1 reply; 16+ messages in thread
From: Aurélien Aptel @ 2019-07-08 11:06 UTC (permalink / raw)
  To: Martijn de Gouw, Steve French; +Cc: Pavel Shilovsky, linux-cifs

Martijn, can you give a try to the PATCH (v2) I sent?

I wrote it on top of the 4.20 stable tree. I understand the process to
apply compile and install is not the easiest one but it would be
extremely helpful since you can reproduce the issue.

Cheers,
-- 
Aurélien Aptel / SUSE Labs Samba Team
GPG: 1839 CB5F 9F5B FB9B AA97  8C99 03C8 A49B 521B D5D3
SUSE Linux GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Many processes end up in uninterruptible sleep accessing cifs mounts
  2019-07-08 11:06                 ` Aurélien Aptel
@ 2019-07-08 13:18                   ` Martijn de Gouw
  0 siblings, 0 replies; 16+ messages in thread
From: Martijn de Gouw @ 2019-07-08 13:18 UTC (permalink / raw)
  To: Aurélien Aptel, Steve French; +Cc: Pavel Shilovsky, linux-cifs

Hi Aurélien,

On 08-07-2019 13:06, Aurélien Aptel wrote:
> Martijn, can you give a try to the PATCH (v2) I sent?
> 
> I wrote it on top of the 4.20 stable tree. I understand the process to
> apply compile and install is not the easiest one but it would be
> extremely helpful since you can reproduce the issue.

I was able to schedule a reboot some of the servers later this evening.
Writing the patch for 4.20 is much appreciated!

Regards, Martijn

> 
> Cheers,
> 

-- 
Martijn de Gouw
Designer
Prodrive Technologies
Mobile: +31 63 17 76 161
Phone:  +31 40 26 76 200

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Many processes end up in uninterruptible sleep accessing cifs mounts
  2019-07-06 16:40               ` Pavel Shilovsky
@ 2019-07-09  8:48                 ` Martijn de Gouw
  2019-07-15 23:40                   ` Pavel Shilovsky
  0 siblings, 1 reply; 16+ messages in thread
From: Martijn de Gouw @ 2019-07-09  8:48 UTC (permalink / raw)
  To: Pavel Shilovsky, Steve French; +Cc: Aurélien Aptel, linux-cifs

Hi, 

Here is another kernel log, but it looks like a different issue.
All cifs mounts have the nohandlecache option set.
This server is running the kernel without the V2 patch:

[706049.499153] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[706049.499183] BUG: unable to handle kernel paging request at ffff8e6381d549d0
[706049.499222] PGD 175c02067 P4D 175c02067 PUD 262470063 PMD 8000000101c000e3 
[706049.499245] Oops: 0011 [#1] PREEMPT SMP PTI
[706049.499258] CPU: 0 PID: 77140 Comm: cifsd Tainted: G            E     4.20.17-pd-4.20.y #20190611
[706049.499280] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/19/2018
[706049.499320] RIP: 0010:0xffff8e6381d549d0
[706049.499332] Code: ff ff 00 00 00 00 e2 04 00 00 00 7d 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> 49 d5 81 63 8e ff ff d0 49 d5 81 63 8e ff ff 00 91 36 88 63 8e
[706049.499374] RSP: 0018:ffff9d2bd0c8bdc0 EFLAGS: 00010202
[706049.499388] RAX: ffff8e6381d549d0 RBX: ffff8e659b599000 RCX: dead000000000200
[706049.499405] RDX: ffff8e66a9fe4280 RSI: 0000000000000246 RDI: ffff8e6381d54998
[706049.499422] RBP: ffff8e66a9fe4280 R08: 0000000000000000 R09: ffff8e6381d54970
[706049.499439] R10: ffff9d2bd0c8bc10 R11: ffff8e66a346c000 R12: ffff8e659b5991c0
[706049.499456] R13: ffff8e66a9fe4280 R14: ffff9d2bd0c8bdd8 R15: ffff8e66a9fe4900
[706049.499473] FS:  0000000000000000(0000) GS:ffff8e66adc00000(0000) knlGS:0000000000000000
[706049.499495] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[706049.499516] CR2: ffff8e6381d549d0 CR3: 0000000427928002 CR4: 00000000003606f0
[706049.499589] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[706049.499607] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[706049.499625] Call Trace:
[706049.499934] CIFS VFS: No task to wake, unknown frame received! NumMids 3
[706049.500076]  ? cifs_reconnect+0x337/0x880 [cifs]
[706049.500764] 00000000: 424d53fe 00010040 00000000 00000005  .SMB@...........
[706049.501157]  ? cifs_handle_standard+0x169/0x190 [cifs]
[706049.501166]  ? cifs_demultiplex_thread+0x9e5/0xbc0 [cifs]
[706049.501867] 00000010: 00000001 00000098 0000046b 00000000  ........k.......
[706049.502259]  ? finish_task_switch+0x7d/0x290
[706049.502980] 00000020: 000002c3 00000001 00000905 00007400  .............t..
[706049.503352]  ? cifs_handle_standard+0x190/0x190 [cifs]
[706049.504057] 00000030: 00000000 00000000 00000000 00000000  ................
[706049.504475]  ? kthread+0xf8/0x130
[706049.504478]  ? kthread_create_worker_on_cpu+0x70/0x70
[706049.505247] CIFS VFS: No task to wake, unknown frame received! NumMids 3
[706049.505681]  ? ret_from_fork+0x35/0x40
[706049.506498] 00000000: 424d53fe 00010040 00000000 00000010  .SMB@...........
[706049.506926] Modules linked in: dm_snapshot(E) dm_bufio(E) cpufreq_userspace(E) cpufreq_conservative(E) cpufreq_powersave(E) arc4(E) ecb(E) nfsv3(E) md4(E) nfs_acl(E) sha512_ssse3(E) nfs(E) sha512_generic(E) cmac(E) hmac(E) lockd(E) nls_utf8(E) cifs(E) grace(E) ccm(E) dns_resolver(E) fscache(E) vmw_vsock_vmci_transport(E) vsock(E) sb_edac(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) vmwgfx(E) ttm(E) aesni_intel(E) drm_kms_helper(E) aes_x86_64(E) crypto_simd(E) sg(E) cryptd(E) joydev(E) evdev(E) vmw_balloon(E) drm(E) glue_helper(E) vmw_vmci(E) serio_raw(E) ac(E) button(E) auth_rpcgss(E) sunrpc(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) dm_mod(E) sr_mod(E) cdrom(E) ata_generic(E) sd_mod(E) ata_piix(E) libata(E) vmw_pvscsi(E) crc32c_intel(E) psmouse(E) e1000e(E) scsi_mod(E) i2c_piix4(E)
[706049.507729] 00000010: 00000005 00000068 0000046c 00000000  ....h...l.......
[706049.508166] CR2: ffff8e6381d549d0
[706049.508965] 00000020: 000002c3 00000001 00000905 00007400  .............t..
[706049.513140] ---[ end trace e371dfb71e7921db ]---
[706049.513143] RIP: 0010:0xffff8e6381d549d0
[706049.513144] Code: ff ff 00 00 00 00 e2 04 00 00 00 7d 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> 49 d5 81 63 8e ff ff d0 49 d5 81 63 8e ff ff 00 91 36 88 63 8e
[706049.513145] RSP: 0018:ffff9d2bd0c8bdc0 EFLAGS: 00010202
[706049.513147] RAX: ffff8e6381d549d0 RBX: ffff8e659b599000 RCX: dead000000000200
[706049.513147] RDX: ffff8e66a9fe4280 RSI: 0000000000000246 RDI: ffff8e6381d54998
[706049.513148] RBP: ffff8e66a9fe4280 R08: 0000000000000000 R09: ffff8e6381d54970
[706049.513148] R10: ffff9d2bd0c8bc10 R11: ffff8e66a346c000 R12: ffff8e659b5991c0
[706049.513149] R13: ffff8e66a9fe4280 R14: ffff9d2bd0c8bdd8 R15: ffff8e66a9fe4900
[706049.513150] FS:  0000000000000000(0000) GS:ffff8e66adc00000(0000) knlGS:0000000000000000
[706049.513151] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[706049.513151] CR2: ffff8e6381d549d0 CR3: 0000000427928002 CR4: 00000000003606f0
[706049.513201] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[706049.521671] 00000030: 00000000 00000000 00000000 00000000  ................
[706049.522058] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[706049.523846] CIFS VFS: No task to wake, unknown frame received! NumMids 3
[706049.524699] 00000000: 424d53fe 00010040 00000000 00060006  .SMB@...........
[706049.525574] 00000010: 00000005 00000000 0000046d 00000000  ........m.......
[706049.526449] 00000020: 000002c3 00000001 00000905 00007400  .............t..
[706049.527304] 00000030: 00000000 00000000 00000000 00000000  ................
[707493.825696] CIFS VFS: Server stor02.bk.prodrive.nl has not responded in 120 seconds. Reconnecting...
[707615.670866] CIFS VFS: Server stor02.bk.prodrive.nl has not responded in 120 seconds. Reconnecting...
[707737.516154] CIFS VFS: Server stor02.bk.prodrive.nl has not responded in 120 seconds. Reconnecting... 

Regards, Martijn

On 06-07-2019 18:40, Pavel Shilovsky wrote:
> Martijn, thanks for the traces. This is a deadlock as mentioned by Aurelien:
> 
> [36247.368103] INFO: task puppet:3836 blocked for more than 120 seconds.
> [36247.368119]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
> [36247.368135] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [36247.368158] puppet          D    0  3836    541 0x00000000
> [36247.368159] Call Trace:
> [36247.368161]  ? __schedule+0x338/0x850
> [36247.368163]  schedule+0x3c/0x90
> [36247.368164]  schedule_preempt_disabled+0x14/0x20
> [36247.368165]  __mutex_lock.isra.7+0x19f/0x540
> [36247.368166]  ? __switch_to_asm+0x34/0x70
> [36247.368167]  ? __switch_to_asm+0x40/0x70
> [36247.368169]  ? try_module_get+0x68/0x100
> [36247.368178]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
> [36247.368186]  smb2_reconnect+0x1d7/0x4b0 [cifs]
> [36247.368188]  ? _raw_spin_unlock_irq+0x1d/0x30
> [36247.368189]  ? finish_task_switch+0x7d/0x290
> [36247.368190]  ? __switch_to_asm+0x40/0x70
> [36247.368198]  smb2_plain_req_init+0x30/0x240 [cifs]
> [36247.368199]  ? _raw_spin_lock_irqsave+0x25/0x50
> [36247.368208]  SMB2_open_init+0x6d/0x7c0 [cifs]
> [36247.368209]  ? _raw_spin_lock+0x13/0x30
> [36247.368218]  ? cifs_mid_q_entry_release+0x15/0x40 [cifs]
> [36247.368227]  ? SMB2_open+0x150/0x520 [cifs]
> [36247.368235]  SMB2_open+0x150/0x520 [cifs]
> [36247.368238]  ? sched_clock+0x5/0x10
> [36247.368247]  ? open_shroot+0x12f/0x200 [cifs]
> [36247.368255]  open_shroot+0x12f/0x200 [cifs]
> [36247.368258]  ? __follow_mount_rcu.isra.31+0x3c/0xf0
> [36247.368267]  smb2_query_path_info+0x93/0x220 [cifs]
> [36247.368269]  ? walk_component+0x48/0x360
> [36247.368278]  cifs_get_inode_info+0x580/0xb10 [cifs]
> [36247.368286]  ? filename_lookup+0xf8/0x1a0
> [36247.368294]  ? build_path_from_dentry_optional_prefix+0x1e9/0x440 [cifs]
> [36247.368303]  cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
> [36247.368311]  cifs_getattr+0x5b/0x1b0 [cifs]
> [36247.368313]  vfs_statx+0x89/0xe0
> [36247.368315]  __do_sys_newlstat+0x39/0x70
> [36247.368318]  do_syscall_64+0x55/0x100
> [36247.368320]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [36247.368321] RIP: 0033:0x7f562d3a0335
> [36247.368323] Code: Bad RIP value.
> 
> This looks urgent to fix - we have a parallel thread with the similar
> problem observed on 5.1 kernel. In order to fix it we need to drop
> open shroot mutex before going into SMB2_open_init and other PDU
> functions.
> 
> You can try to work around this by specifying "nohandlecache" mount option.
> 
> --
> Best regards,
> Pavel Shilovsky
> 
> сб, 6 июл. 2019 г. в 06:05, Steve French <smfrench@gmail.com>:
>>
>> Are we sure this isn't something really simple like the server IP
>> address changing?  Remember in 5.0 kernel (also updated cifs-utils)
>> the code to allow failover when server's IP address changes was added.
>>
>> On Sat, Jul 6, 2019 at 6:12 AM Martijn de Gouw
>> <martijn.de.gouw@prodrive-technologies.com> wrote:
>>>
>>> Hi Pavel,
>>>
>>> On 04-07-2019 23:34, Pavel Shilovsky wrote:
>>>> Hi Martijn,
>>>>
>>>> Thanks for reporting the problem. Have you tried v5.1.5 kernel and
>>>> above? That one has many reconnect fixes. If yes, could you please
>>>> provide the kernel stack / panic traces when running new versions of
>>>> the kernel?
>>>
>>> I can try to run them for a while on a less critical server. But even if that machine runs fine, there is not really a guarantee the cifs issue we have right now is solved for our critical machines, causing me to be very hesitated to upgrade those.
>>>
>>> Btw, here is another kernel dump, generated today. Please note that on this machine the cifs mounts are not really used, all 'action'
>>> performed on these mounts is the check if they are still mounted (by puppet).
>>>
>>>
>>> [36247.367572] INFO: task node_exporter:3567 blocked for more than 120 seconds.
>>> [36247.367613]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
>>> [36247.367630] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> [36247.367648] node_exporter   D    0  3567      1 0x00000000
>>> [36247.367650] Call Trace:
>>> [36247.367668]  ? __schedule+0x338/0x850
>>> [36247.367671]  ? preempt_count_add+0x67/0xb0
>>> [36247.367673]  schedule+0x3c/0x90
>>> [36247.367675]  schedule_preempt_disabled+0x14/0x20
>>> [36247.367676]  __mutex_lock.isra.7+0x19f/0x540
>>> [36247.367680]  ? try_module_get+0x68/0x100
>>> [36247.367792]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
>>> [36247.367803]  smb2_reconnect+0x1d7/0x4b0 [cifs]
>>> [36247.367824]  smb2_plain_req_init+0x30/0x240 [cifs]
>>> [36247.367826]  ? __switch_to_asm+0x34/0x70
>>> [36247.367827]  ? __switch_to_asm+0x40/0x70
>>> [36247.367836]  SMB2_open_init+0x6d/0x7c0 [cifs]
>>> [36247.367846]  ? smb2_queryfs+0x13d/0x350 [cifs]
>>> [36247.367848]  ? lookup_fast+0xc8/0x2e0
>>> [36247.367857]  smb2_queryfs+0x13d/0x350 [cifs]
>>> [36247.367861]  ? lookup_fast+0xc8/0x2e0
>>> [36247.367863]  ? ___cache_free+0x31/0x2e0
>>> [36247.367865]  ? terminate_walk+0x93/0x100
>>> [36247.367873]  ? cifs_statfs+0xad/0x300 [cifs]
>>> [36247.367880]  cifs_statfs+0xad/0x300 [cifs]
>>> [36247.367894]  statfs_by_dentry+0x6a/0x90
>>> [36247.367895]  vfs_statfs+0x16/0xc0
>>> [36247.367897]  user_statfs+0x50/0xa0
>>> [36247.367898]  __do_sys_statfs+0x20/0x50
>>> [36247.367902]  do_syscall_64+0x55/0x100
>>> [36247.367903]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [36247.367905] RIP: 0033:0x4a5c20
>>> [36247.367936] Code: Bad RIP value.
>>> [36247.367937] RSP: 002b:000000c0003ef490 EFLAGS: 00000206 ORIG_RAX: 0000000000000089
>>> [36247.367938] RAX: ffffffffffffffda RBX: 000000c00002a000 RCX: 00000000004a5c20
>>> [36247.367939] RDX: 0000000000000000 RSI: 000000c0003ef5c0 RDI: 000000c0004510e0
>>> [36247.367940] RBP: 000000c0003ef4f0 R08: 0000000000000000 R09: 0000000000000000
>>> [36247.367940] R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff
>>> [36247.367941] R13: 0000000000000088 R14: 0000000000000087 R15: 0000000000000100
>>> [36247.367954] INFO: task kworker/1:2:32265 blocked for more than 120 seconds.
>>> [36247.367973]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
>>> [36247.367989] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> [36247.368008] kworker/1:2     D    0 32265      2 0x80000000
>>> [36247.368020] Workqueue: cifsiod smb2_reconnect_server [cifs]
>>> [36247.368021] Call Trace:
>>> [36247.368023]  ? __schedule+0x338/0x850
>>> [36247.368025]  schedule+0x3c/0x90
>>> [36247.368026]  schedule_preempt_disabled+0x14/0x20
>>> [36247.368028]  __mutex_lock.isra.7+0x19f/0x540
>>> [36247.368037]  ? SMB2_sess_auth_rawntlmssp_authenticate+0x69/0x170 [cifs]
>>> [36247.368046]  ? SMB2_sess_setup+0x17f/0x2c0 [cifs]
>>> [36247.368054]  ? cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
>>> [36247.368055]  ? _raw_spin_unlock+0x16/0x30
>>> [36247.368062]  cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
>>> [36247.368071]  smb2_reconnect+0x2d6/0x4b0 [cifs]
>>> [36247.368073]  ? _raw_spin_unlock+0x16/0x30
>>> [36247.368074]  ? preempt_count_add+0x67/0xb0
>>> [36247.368083]  smb2_reconnect_server+0x1bb/0x350 [cifs]
>>> [36247.368090]  process_one_work+0x188/0x3a0
>>> [36247.368092]  worker_thread+0x4c/0x3a0
>>> [36247.368094]  ? preempt_count_add+0x67/0xb0
>>> [36247.368095]  ? process_one_work+0x3a0/0x3a0
>>> [36247.368097]  kthread+0xf8/0x130
>>> [36247.368099]  ? kthread_create_worker_on_cpu+0x70/0x70
>>> [36247.368101]  ret_from_fork+0x35/0x40
>>> [36247.368103] INFO: task puppet:3836 blocked for more than 120 seconds.
>>> [36247.368119]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
>>> [36247.368135] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> [36247.368158] puppet          D    0  3836    541 0x00000000
>>> [36247.368159] Call Trace:
>>> [36247.368161]  ? __schedule+0x338/0x850
>>> [36247.368163]  schedule+0x3c/0x90
>>> [36247.368164]  schedule_preempt_disabled+0x14/0x20
>>> [36247.368165]  __mutex_lock.isra.7+0x19f/0x540
>>> [36247.368166]  ? __switch_to_asm+0x34/0x70
>>> [36247.368167]  ? __switch_to_asm+0x40/0x70
>>> [36247.368169]  ? try_module_get+0x68/0x100
>>> [36247.368178]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
>>> [36247.368186]  smb2_reconnect+0x1d7/0x4b0 [cifs]
>>> [36247.368188]  ? _raw_spin_unlock_irq+0x1d/0x30
>>> [36247.368189]  ? finish_task_switch+0x7d/0x290
>>> [36247.368190]  ? __switch_to_asm+0x40/0x70
>>> [36247.368198]  smb2_plain_req_init+0x30/0x240 [cifs]
>>> [36247.368199]  ? _raw_spin_lock_irqsave+0x25/0x50
>>> [36247.368208]  SMB2_open_init+0x6d/0x7c0 [cifs]
>>> [36247.368209]  ? _raw_spin_lock+0x13/0x30
>>> [36247.368218]  ? cifs_mid_q_entry_release+0x15/0x40 [cifs]
>>> [36247.368227]  ? SMB2_open+0x150/0x520 [cifs]
>>> [36247.368235]  SMB2_open+0x150/0x520 [cifs]
>>> [36247.368238]  ? sched_clock+0x5/0x10
>>> [36247.368247]  ? open_shroot+0x12f/0x200 [cifs]
>>> [36247.368255]  open_shroot+0x12f/0x200 [cifs]
>>> [36247.368258]  ? __follow_mount_rcu.isra.31+0x3c/0xf0
>>> [36247.368267]  smb2_query_path_info+0x93/0x220 [cifs]
>>> [36247.368269]  ? walk_component+0x48/0x360
>>> [36247.368278]  cifs_get_inode_info+0x580/0xb10 [cifs]
>>> [36247.368286]  ? filename_lookup+0xf8/0x1a0
>>> [36247.368294]  ? build_path_from_dentry_optional_prefix+0x1e9/0x440 [cifs]
>>> [36247.368303]  cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
>>> [36247.368311]  cifs_getattr+0x5b/0x1b0 [cifs]
>>> [36247.368313]  vfs_statx+0x89/0xe0
>>> [36247.368315]  __do_sys_newlstat+0x39/0x70
>>> [36247.368318]  do_syscall_64+0x55/0x100
>>> [36247.368320]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [36247.368321] RIP: 0033:0x7f562d3a0335
>>> [36247.368323] Code: Bad RIP value.
>>> [36247.368324] RSP: 002b:00007ffc6bbc4398 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
>>> [36247.368325] RAX: ffffffffffffffda RBX: 00007ffc6bbc43b0 RCX: 00007f562d3a0335
>>> [36247.368326] RDX: 00007ffc6bbc43b0 RSI: 00007ffc6bbc43b0 RDI: 000055e4e5e1bd50
>>> [36247.368326] RBP: 000055e4e2eb65e0 R08: 0000000000000002 R09: 000055e4e2eb69e0
>>> [36247.368327] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
>>> [36247.368328] R13: 000055e4e2ef7258 R14: 000055e4e2ef8810 R15: 000055e4e2f92ad0
>>> [36368.186922] INFO: task node_exporter:3567 blocked for more than 120 seconds.
>>> [36368.186988]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
>>> [36368.187039] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> [36368.187097] node_exporter   D    0  3567      1 0x00000000
>>> [36368.187102] Call Trace:
>>> [36368.187117]  ? __schedule+0x338/0x850
>>> [36368.187123]  ? preempt_count_add+0x67/0xb0
>>> [36368.187128]  schedule+0x3c/0x90
>>> [36368.187133]  schedule_preempt_disabled+0x14/0x20
>>> [36368.187138]  __mutex_lock.isra.7+0x19f/0x540
>>> [36368.187175]  ? try_module_get+0x68/0x100
>>> [36368.187232]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
>>> [36368.187265]  smb2_reconnect+0x1d7/0x4b0 [cifs]
>>> [36368.187300]  smb2_plain_req_init+0x30/0x240 [cifs]
>>> [36368.187304]  ? __switch_to_asm+0x34/0x70
>>> [36368.187307]  ? __switch_to_asm+0x40/0x70
>>> [36368.187337]  SMB2_open_init+0x6d/0x7c0 [cifs]
>>> [36368.187372]  ? smb2_queryfs+0x13d/0x350 [cifs]
>>> [36368.187375]  ? lookup_fast+0xc8/0x2e0
>>> [36368.187405]  smb2_queryfs+0x13d/0x350 [cifs]
>>> [36368.187410]  ? lookup_fast+0xc8/0x2e0
>>> [36368.187416]  ? ___cache_free+0x31/0x2e0
>>> [36368.187421]  ? terminate_walk+0x93/0x100
>>> [36368.187449]  ? cifs_statfs+0xad/0x300 [cifs]
>>> [36368.187471]  cifs_statfs+0xad/0x300 [cifs]
>>> [36368.187477]  statfs_by_dentry+0x6a/0x90
>>> [36368.187482]  vfs_statfs+0x16/0xc0
>>> [36368.187486]  user_statfs+0x50/0xa0
>>> [36368.187491]  __do_sys_statfs+0x20/0x50
>>> [36368.187499]  do_syscall_64+0x55/0x100
>>> [36368.187503]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [36368.187507] RIP: 0033:0x4a5c20
>>> [36368.187517] Code: Bad RIP value.
>>> [36368.187519] RSP: 002b:000000c0003ef490 EFLAGS: 00000206 ORIG_RAX: 0000000000000089
>>> [36368.187523] RAX: ffffffffffffffda RBX: 000000c00002a000 RCX: 00000000004a5c20
>>> [36368.187526] RDX: 0000000000000000 RSI: 000000c0003ef5c0 RDI: 000000c0004510e0
>>> [36368.187528] RBP: 000000c0003ef4f0 R08: 0000000000000000 R09: 0000000000000000
>>> [36368.187530] R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff
>>> [36368.187532] R13: 0000000000000088 R14: 0000000000000087 R15: 0000000000000100
>>> [36368.187572] INFO: task kworker/1:2:32265 blocked for more than 120 seconds.
>>> [36368.187627]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
>>> [36368.187677] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> [36368.187735] kworker/1:2     D    0 32265      2 0x80000000
>>> [36368.187771] Workqueue: cifsiod smb2_reconnect_server [cifs]
>>> [36368.187773] Call Trace:
>>> [36368.187780]  ? __schedule+0x338/0x850
>>> [36368.187785]  schedule+0x3c/0x90
>>> [36368.187790]  schedule_preempt_disabled+0x14/0x20
>>> [36368.187795]  __mutex_lock.isra.7+0x19f/0x540
>>> [36368.187826]  ? SMB2_sess_auth_rawntlmssp_authenticate+0x69/0x170 [cifs]
>>> [36368.187855]  ? SMB2_sess_setup+0x17f/0x2c0 [cifs]
>>> [36368.187880]  ? cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
>>> [36368.187885]  ? _raw_spin_unlock+0x16/0x30
>>> [36368.187908]  cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
>>> [36368.187936]  smb2_reconnect+0x2d6/0x4b0 [cifs]
>>> [36368.187942]  ? _raw_spin_unlock+0x16/0x30
>>> [36368.187947]  ? preempt_count_add+0x67/0xb0
>>> [36368.187976]  smb2_reconnect_server+0x1bb/0x350 [cifs]
>>> [36368.187984]  process_one_work+0x188/0x3a0
>>> [36368.187990]  worker_thread+0x4c/0x3a0
>>> [36368.187994]  ? preempt_count_add+0x67/0xb0
>>> [36368.187998]  ? process_one_work+0x3a0/0x3a0
>>> [36368.188001]  kthread+0xf8/0x130
>>> [36368.188005]  ? kthread_create_worker_on_cpu+0x70/0x70
>>> [36368.188009]  ret_from_fork+0x35/0x40
>>> [36368.188014] INFO: task puppet:3836 blocked for more than 120 seconds.
>>> [36368.188064]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
>>> [36368.188114] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> [36368.188182] puppet          D    0  3836    541 0x00000000
>>> [36368.188186] Call Trace:
>>> [36368.188191]  ? __schedule+0x338/0x850
>>> [36368.188196]  schedule+0x3c/0x90
>>> [36368.188200]  schedule_preempt_disabled+0x14/0x20
>>> [36368.188205]  __mutex_lock.isra.7+0x19f/0x540
>>> [36368.188208]  ? __switch_to_asm+0x34/0x70
>>> [36368.188211]  ? __switch_to_asm+0x40/0x70
>>> [36368.188215]  ? try_module_get+0x68/0x100
>>> [36368.188244]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
>>> [36368.188271]  smb2_reconnect+0x1d7/0x4b0 [cifs]
>>> [36368.188277]  ? _raw_spin_unlock_irq+0x1d/0x30
>>> [36368.188280]  ? finish_task_switch+0x7d/0x290
>>> [36368.188283]  ? __switch_to_asm+0x40/0x70
>>> [36368.188310]  smb2_plain_req_init+0x30/0x240 [cifs]
>>> [36368.188314]  ? _raw_spin_lock_irqsave+0x25/0x50
>>> [36368.188342]  SMB2_open_init+0x6d/0x7c0 [cifs]
>>> [36368.188346]  ? _raw_spin_lock+0x13/0x30
>>> [36368.188376]  ? cifs_mid_q_entry_release+0x15/0x40 [cifs]
>>> [36368.188405]  ? SMB2_open+0x150/0x520 [cifs]
>>> [36368.188432]  SMB2_open+0x150/0x520 [cifs]
>>> [36368.188440]  ? sched_clock+0x5/0x10
>>> [36368.188470]  ? open_shroot+0x12f/0x200 [cifs]
>>> [36368.188497]  open_shroot+0x12f/0x200 [cifs]
>>> [36368.188503]  ? __follow_mount_rcu.isra.31+0x3c/0xf0
>>> [36368.188534]  smb2_query_path_info+0x93/0x220 [cifs]
>>> [36368.188539]  ? walk_component+0x48/0x360
>>> [36368.188569]  cifs_get_inode_info+0x580/0xb10 [cifs]
>>> [36368.188574]  ? filename_lookup+0xf8/0x1a0
>>> [36368.188601]  ? build_path_from_dentry_optional_prefix+0x1e9/0x440 [cifs]
>>> [36368.188630]  cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
>>> [36368.188659]  cifs_getattr+0x5b/0x1b0 [cifs]
>>> [36368.188664]  vfs_statx+0x89/0xe0
>>> [36368.188670]  __do_sys_newlstat+0x39/0x70
>>> [36368.188677]  do_syscall_64+0x55/0x100
>>> [36368.188681]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [36368.188684] RIP: 0033:0x7f562d3a0335
>>> [36368.188689] Code: Bad RIP value.
>>> [36368.188691] RSP: 002b:00007ffc6bbc4398 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
>>> [36368.188695] RAX: ffffffffffffffda RBX: 00007ffc6bbc43b0 RCX: 00007f562d3a0335
>>> [36368.188696] RDX: 00007ffc6bbc43b0 RSI: 00007ffc6bbc43b0 RDI: 000055e4e5e1bd50
>>> [36368.188698] RBP: 000055e4e2eb65e0 R08: 0000000000000002 R09: 000055e4e2eb69e0
>>> [36368.188700] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
>>> [36368.188702] R13: 000055e4e2ef7258 R14: 000055e4e2ef8810 R15: 000055e4e2f92ad0
>>> [36489.006262] INFO: task node_exporter:3567 blocked for more than 120 seconds.
>>> [36489.006300]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
>>> [36489.006329] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> [36489.006458] node_exporter   D    0  3567      1 0x00000000
>>> [36489.006461] Call Trace:
>>> [36489.006470]  ? __schedule+0x338/0x850
>>> [36489.006474]  ? preempt_count_add+0x67/0xb0
>>> [36489.006476]  schedule+0x3c/0x90
>>> [36489.006478]  schedule_preempt_disabled+0x14/0x20
>>> [36489.006480]  __mutex_lock.isra.7+0x19f/0x540
>>> [36489.006485]  ? try_module_get+0x68/0x100
>>> [36489.006515]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
>>> [36489.006531]  smb2_reconnect+0x1d7/0x4b0 [cifs]
>>> [36489.006547]  smb2_plain_req_init+0x30/0x240 [cifs]
>>> [36489.006549]  ? __switch_to_asm+0x34/0x70
>>> [36489.006550]  ? __switch_to_asm+0x40/0x70
>>> [36489.006565]  SMB2_open_init+0x6d/0x7c0 [cifs]
>>> [36489.006581]  ? smb2_queryfs+0x13d/0x350 [cifs]
>>> [36489.006583]  ? lookup_fast+0xc8/0x2e0
>>> [36489.006597]  smb2_queryfs+0x13d/0x350 [cifs]
>>> [36489.006599]  ? lookup_fast+0xc8/0x2e0
>>> [36489.006602]  ? ___cache_free+0x31/0x2e0
>>> [36489.006604]  ? terminate_walk+0x93/0x100
>>> [36489.006619]  ? cifs_statfs+0xad/0x300 [cifs]
>>> [36489.006632]  cifs_statfs+0xad/0x300 [cifs]
>>> [36489.006635]  statfs_by_dentry+0x6a/0x90
>>> [36489.006637]  vfs_statfs+0x16/0xc0
>>> [36489.006639]  user_statfs+0x50/0xa0
>>> [36489.006641]  __do_sys_statfs+0x20/0x50
>>> [36489.006645]  do_syscall_64+0x55/0x100
>>> [36489.006647]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [36489.006649] RIP: 0033:0x4a5c20
>>> [36489.006654] Code: Bad RIP value.
>>> [36489.006655] RSP: 002b:000000c0003ef490 EFLAGS: 00000206 ORIG_RAX: 0000000000000089
>>> [36489.006657] RAX: ffffffffffffffda RBX: 000000c00002a000 RCX: 00000000004a5c20
>>> [36489.006658] RDX: 0000000000000000 RSI: 000000c0003ef5c0 RDI: 000000c0004510e0
>>> [36489.006659] RBP: 000000c0003ef4f0 R08: 0000000000000000 R09: 0000000000000000
>>> [36489.006660] R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff
>>> [36489.006661] R13: 0000000000000088 R14: 0000000000000087 R15: 0000000000000100
>>> [36489.006675] INFO: task kworker/1:2:32265 blocked for more than 120 seconds.
>>> [36489.006812]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
>>> [36489.006940] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> [36489.007076] kworker/1:2     D    0 32265      2 0x80000000
>>> [36489.007100] Workqueue: cifsiod smb2_reconnect_server [cifs]
>>> [36489.007101] Call Trace:
>>> [36489.007105]  ? __schedule+0x338/0x850
>>> [36489.007107]  schedule+0x3c/0x90
>>> [36489.007109]  schedule_preempt_disabled+0x14/0x20
>>> [36489.007111]  __mutex_lock.isra.7+0x19f/0x540
>>> [36489.007124]  ? SMB2_sess_auth_rawntlmssp_authenticate+0x69/0x170 [cifs]
>>> [36489.007134]  ? SMB2_sess_setup+0x17f/0x2c0 [cifs]
>>> [36489.007141]  ? cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
>>> [36489.007143]  ? _raw_spin_unlock+0x16/0x30
>>> [36489.007150]  cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
>>> [36489.007159]  smb2_reconnect+0x2d6/0x4b0 [cifs]
>>> [36489.007161]  ? _raw_spin_unlock+0x16/0x30
>>> [36489.007163]  ? preempt_count_add+0x67/0xb0
>>> [36489.007172]  smb2_reconnect_server+0x1bb/0x350 [cifs]
>>> [36489.007176]  process_one_work+0x188/0x3a0
>>> [36489.007178]  worker_thread+0x4c/0x3a0
>>> [36489.007179]  ? preempt_count_add+0x67/0xb0
>>> [36489.007180]  ? process_one_work+0x3a0/0x3a0
>>> [36489.007181]  kthread+0xf8/0x130
>>> [36489.007183]  ? kthread_create_worker_on_cpu+0x70/0x70
>>> [36489.007184]  ret_from_fork+0x35/0x40
>>> [36489.007186] INFO: task puppet:3836 blocked for more than 120 seconds.
>>> [36489.007294]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
>>> [36489.007409] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> [36489.007540] puppet          D    0  3836    541 0x00000000
>>> [36489.007541] Call Trace:
>>> [36489.007545]  ? __schedule+0x338/0x850
>>> [36489.007548]  schedule+0x3c/0x90
>>> [36489.007550]  schedule_preempt_disabled+0x14/0x20
>>> [36489.007552]  __mutex_lock.isra.7+0x19f/0x540
>>> [36489.007553]  ? __switch_to_asm+0x34/0x70
>>> [36489.007555]  ? __switch_to_asm+0x40/0x70
>>> [36489.007558]  ? try_module_get+0x68/0x100
>>> [36489.007580]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
>>> [36489.007594]  smb2_reconnect+0x1d7/0x4b0 [cifs]
>>> [36489.007596]  ? _raw_spin_unlock_irq+0x1d/0x30
>>> [36489.007598]  ? finish_task_switch+0x7d/0x290
>>> [36489.007600]  ? __switch_to_asm+0x40/0x70
>>> [36489.007613]  smb2_plain_req_init+0x30/0x240 [cifs]
>>> [36489.007615]  ? _raw_spin_lock_irqsave+0x25/0x50
>>> [36489.007628]  SMB2_open_init+0x6d/0x7c0 [cifs]
>>> [36489.007630]  ? _raw_spin_lock+0x13/0x30
>>> [36489.007644]  ? cifs_mid_q_entry_release+0x15/0x40 [cifs]
>>> [36489.007658]  ? SMB2_open+0x150/0x520 [cifs]
>>> [36489.007672]  SMB2_open+0x150/0x520 [cifs]
>>> [36489.007677]  ? sched_clock+0x5/0x10
>>> [36489.007692]  ? open_shroot+0x12f/0x200 [cifs]
>>> [36489.007707]  open_shroot+0x12f/0x200 [cifs]
>>> [36489.007711]  ? __follow_mount_rcu.isra.31+0x3c/0xf0
>>> [36489.007727]  smb2_query_path_info+0x93/0x220 [cifs]
>>> [36489.007729]  ? walk_component+0x48/0x360
>>> [36489.007745]  cifs_get_inode_info+0x580/0xb10 [cifs]
>>> [36489.007748]  ? filename_lookup+0xf8/0x1a0
>>> [36489.007762]  ? build_path_from_dentry_optional_prefix+0x1e9/0x440 [cifs]
>>> [36489.007778]  cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
>>> [36489.007791]  cifs_getattr+0x5b/0x1b0 [cifs]
>>> [36489.007794]  vfs_statx+0x89/0xe0
>>> [36489.007797]  __do_sys_newlstat+0x39/0x70
>>> [36489.007801]  do_syscall_64+0x55/0x100
>>> [36489.007802]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [36489.007805] RIP: 0033:0x7f562d3a0335
>>> [36489.007808] Code: Bad RIP value.
>>> [36489.007809] RSP: 002b:00007ffc6bbc4398 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
>>> [36489.007811] RAX: ffffffffffffffda RBX: 00007ffc6bbc43b0 RCX: 00007f562d3a0335
>>> [36489.007812] RDX: 00007ffc6bbc43b0 RSI: 00007ffc6bbc43b0 RDI: 000055e4e5e1bd50
>>> [36489.007813] RBP: 000055e4e2eb65e0 R08: 0000000000000002 R09: 000055e4e2eb69e0
>>> [36489.007814] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
>>> [36489.007815] R13: 000055e4e2ef7258 R14: 000055e4e2ef8810 R15: 000055e4e2f92ad0
>>> [36609.825705] INFO: task node_exporter:3567 blocked for more than 120 seconds.
>>> [36609.825836]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
>>> [36609.825869] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> [36609.825908] node_exporter   D    0  3567      1 0x00000000
>>> [36609.825910] Call Trace:
>>> [36609.825919]  ? __schedule+0x338/0x850
>>> [36609.825922]  ? preempt_count_add+0x67/0xb0
>>> [36609.825925]  schedule+0x3c/0x90
>>> [36609.825927]  schedule_preempt_disabled+0x14/0x20
>>> [36609.825929]  __mutex_lock.isra.7+0x19f/0x540
>>> [36609.825933]  ? try_module_get+0x68/0x100
>>> [36609.825960]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
>>> [36609.825979]  smb2_reconnect+0x1d7/0x4b0 [cifs]
>>> [36609.825999]  smb2_plain_req_init+0x30/0x240 [cifs]
>>> [36609.826001]  ? __switch_to_asm+0x34/0x70
>>> [36609.826002]  ? __switch_to_asm+0x40/0x70
>>> [36609.826020]  SMB2_open_init+0x6d/0x7c0 [cifs]
>>> [36609.826039]  ? smb2_queryfs+0x13d/0x350 [cifs]
>>> [36609.826040]  ? lookup_fast+0xc8/0x2e0
>>> [36609.826057]  smb2_queryfs+0x13d/0x350 [cifs]
>>> [36609.826060]  ? lookup_fast+0xc8/0x2e0
>>> [36609.826064]  ? ___cache_free+0x31/0x2e0
>>> [36609.826066]  ? terminate_walk+0x93/0x100
>>> [36609.826084]  ? cifs_statfs+0xad/0x300 [cifs]
>>> [36609.826099]  cifs_statfs+0xad/0x300 [cifs]
>>> [36609.826103]  statfs_by_dentry+0x6a/0x90
>>> [36609.826104]  vfs_statfs+0x16/0xc0
>>> [36609.826106]  user_statfs+0x50/0xa0
>>> [36609.826108]  __do_sys_statfs+0x20/0x50
>>> [36609.826113]  do_syscall_64+0x55/0x100
>>> [36609.826115]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [36609.826117] RIP: 0033:0x4a5c20
>>> [36609.826123] Code: Bad RIP value.
>>> [36609.826124] RSP: 002b:000000c0003ef490 EFLAGS: 00000206 ORIG_RAX: 0000000000000089
>>> [36609.826126] RAX: ffffffffffffffda RBX: 000000c00002a000 RCX: 00000000004a5c20
>>> [36609.826127] RDX: 0000000000000000 RSI: 000000c0003ef5c0 RDI: 000000c0004510e0
>>> [36609.826128] RBP: 000000c0003ef4f0 R08: 0000000000000000 R09: 0000000000000000
>>> [36609.826129] R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff
>>> [36609.826130] R13: 0000000000000088 R14: 0000000000000087 R15: 0000000000000100
>>> [73117.654990] CIFS VFS: Send error in SessSetup = -126
>>>
>>> Regards, Martijn
>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Pavel Shilovsky
>>>>
>>>> чт, 4 июл. 2019 г. в 06:36, Martijn de Gouw
>>>> <martijn.de.gouw@prodrive-technologies.com>:
>>>>>
>>>>> Hi,
>>>>>
>>>>> On 04-07-2019 13:22, Aurélien Aptel wrote:
>>>>>> Martijn de Gouw <martijn.de.gouw@prodrive-technologies.com> writes:
>>>>>>>> Are there any kernel oops/panic with stack traces and register dumps in
>>>>>>>> the log?
>>>>>>>>
>>>>>>>> You can inspect the kernel stack trace of the hung processes (to see where
>>>>>>>> they are stuck) by printing the file /proc/<pid>/stack.
>>>>>>>
>>>>>>> These are the stacks of all processes that are D, most of them being df.
>>>>>>> I also attached the cifs Stats output below.
>>>>>>
>>>>>> Ok thanks. What about Oops or BUG or panic in dmesg logs, did you see
>>>>>> any?
>>>>>
>>>>> I did find the following messages in the dmesg of one of the servers:
>>>>>
>>>>> [    4.797893] FS-Cache: Duplicate cookie detected
>>>>> [    4.797915] FS-Cache: O-cookie c=000000001a791554 [p=00000000fb6f31ee fl=222 nc=0 na=1]
>>>>> [    4.797934] FS-Cache: O-cookie d=0000000020a06fab n=00000000654600e7
>>>>> [    4.797949] FS-Cache: O-key=[8] '020001bd0a010102'
>>>>> [    4.797963] FS-Cache: N-cookie c=000000005d0bf4a5 [p=00000000fb6f31ee fl=2 nc=0 na=1]
>>>>> [    4.797982] FS-Cache: N-cookie d=0000000020a06fab n=000000004e1e47aa
>>>>> [    4.797997] FS-Cache: N-key=[8] '020001bd0a010102'
>>>>> [    4.798643] FS-Cache: Duplicate cookie detected
>>>>> [    4.798659] FS-Cache: O-cookie c=000000001a791554 [p=00000000fb6f31ee fl=222 nc=0 na=1]
>>>>> [    4.798679] FS-Cache: O-cookie d=0000000020a06fab n=00000000654600e7
>>>>> [    4.798695] FS-Cache: O-key=[8] '020001bd0a010102'
>>>>> [    4.798714] FS-Cache: N-cookie c=00000000cbe44971 [p=00000000fb6f31ee fl=2 nc=0 na=1]
>>>>> [    4.798733] FS-Cache: N-cookie d=0000000020a06fab n=00000000ab0e78a6
>>>>> [    4.798747] FS-Cache: N-key=[8] '020001bd0a010102'
>>>>> [    4.906667] FS-Cache: Netfs 'nfs' registered for caching
>>>>> [12738.729173] CIFS VFS: Send error in SessSetup = -126
>>>>> [99125.480751] CIFS VFS: Send error in SessSetup = -126
>>>>> [185517.295175] CIFS VFS: Send error in SessSetup = -126
>>>>> [250515.749714] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
>>>>> [250515.749740] BUG: unable to handle kernel paging request at ffff8ec52a6fe1d0
>>>>> [250515.749757] PGD 1b2602067 P4D 1b2602067 PUD 42dbff063 PMD 42a357063 PTE 800000042a6fe063
>>>>> [250515.749779] Oops: 0011 [#1] PREEMPT SMP PTI
>>>>> [250515.749792] CPU: 1 PID: 796 Comm: cifsd Tainted: G            E     4.20.17-pd-4.20.y #20190611
>>>>> [250515.749812] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/19/2018
>>>>> [250515.749844] RIP: 0010:0xffff8ec52a6fe1d0
>>>>> [250515.749860] Code: ff ff 00 00 00 00 fd 01 00 00 00 7d 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> e1 6f 2a c5 8e ff ff d0 e1 6f 2a c5 8e ff ff 80 70 82 2b c5 8e
>>>>> [250515.749914] RSP: 0018:ffff9e60c2e2bdc0 EFLAGS: 00010202
>>>>> [250515.749927] RAX: ffff8ec52a6fe1d0 RBX: ffff8ec526fce800 RCX: dead000000000200
>>>>> [250515.749944] RDX: ffff8ec52d3e6800 RSI: 0000000000000246 RDI: ffff8ec52a6fe198
>>>>> [250515.749962] RBP: ffff8ec52d3e6800 R08: 0000000000000002 R09: ffff8ec52a6fe170
>>>>> [250515.749979] R10: ffff9e60c2e2bc10 R11: ffff8ec527088000 R12: ffff8ec526fce9c0
>>>>> [250515.749997] R13: ffff8ec52d3e6800 R14: ffff9e60c2e2bdd8 R15: ffff8ec52d3e6b80
>>>>> [250515.750014] FS:  0000000000000000(0000) GS:ffff8ec52fa80000(0000) knlGS:0000000000000000
>>>>> [250515.750033] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [250515.750048] CR2: ffff8ec52a6fe1d0 CR3: 000000042a53c005 CR4: 00000000003606e0
>>>>> [250515.750100] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>> [250515.750102] CIFS VFS: No task to wake, unknown frame received! NumMids 3
>>>>> [250515.750119] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>> [250515.750120] Call Trace:
>>>>> [250515.750149] 00000000: 424d53fe 00010040 00000000 00000005  .SMB@...........
>>>>> [250515.750193]  ? cifs_reconnect+0x337/0x880 [cifs]
>>>>> [250515.750201]  ? cifs_handle_standard+0x169/0x190 [cifs]
>>>>> [250515.750216] 00000010: 10000009 00000098 00000db2 00000000  ................
>>>>> [250515.750234]  ? cifs_demultiplex_thread+0x9e5/0xbc0 [cifs]
>>>>> [250515.750240] 00000020: 0000025c 00000001 640002e9 0000c81d  \..........d....
>>>>> [250515.750259]  ? finish_task_switch+0x7d/0x290
>>>>> [250515.750271] 00000030: 5a9e63f3 6204a9b0 587058e3 0d45419b  .c.Z...b.XpX.AE.
>>>>> [250515.750295]  ? cifs_handle_standard+0x190/0x190 [cifs]
>>>>> [250515.750728] CIFS VFS: No task to wake, unknown frame received! NumMids 3
>>>>> [250515.751200]  ? kthread+0xf8/0x130
>>>>> [250515.751639] 00000000: 424d53fe 00010040 00000000 00000010  .SMB@...........
>>>>> [250515.752102]  ? kthread_create_worker_on_cpu+0x70/0x70
>>>>> [250515.752554] 00000010: 0000000d 00000068 00000db3 00000000  ....h...........
>>>>> [250515.752997]  ? ret_from_fork+0x35/0x40
>>>>> [250515.753429] 00000020: 0000025c 00000001 640002e9 0000c81d  \..........d....
>>>>> [250515.753874] Modules linked in: cpufreq_userspace(E) cpufreq_powersave(E) cpufreq_conservative(E) arc4(E) ecb(E) md4(E) nfsv3(E) nfs_acl(E) nfs(E) sha512_ssse3(E) sha512_generic(E) lockd(E) cmac(E) grace(E) hmac(E) nls_utf8(E) cifs(E) ccm(E) dns_resolver(E) fscache(E) vmw_vsock_vmci_transport(E) vsock(E) sb_edac(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) vmw_balloon(E) aes_x86_64(E) joydev(E) evdev(E) crypto_simd(E) vmwgfx(E) serio_raw(E) cryptd(E) glue_helper(E) ttm(E) sg(E) vmw_vmci(E) drm_kms_helper(E) drm(E) button(E) ac(E) auth_rpcgss(E) sunrpc(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) ata_generic(E) sd_mod(E) ata_piix(E) crc32c_intel(E) psmouse(E) libata(E) vmw_pvscsi(E) vmxnet3(E) scsi_mod(E) i2c_piix4(E)
>>>>> [250515.754336] 00000030: 49d4fd21 858665a2 fde5288f 01d2d919  !..I.e...(......
>>>>> [250515.754766] CR2: ffff8ec52a6fe1d0
>>>>> [250515.758927] CIFS VFS: No task to wake, unknown frame received! NumMids 3
>>>>> [250515.759389] ---[ end trace 92ea62cd080150de ]---
>>>>> [250515.759879] 00000000: 424d53fe 00010040 00000000 00030006  .SMB@...........
>>>>> [250515.760357] RIP: 0010:0xffff8ec52a6fe1d0
>>>>> [250515.761841] Code: ff ff 00 00 00 00 fd 01 00 00 00 7d 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> e1 6f 2a c5 8e ff ff d0 e1 6f 2a c5 8e ff ff 80 70 82 2b c5 8e
>>>>> [250515.763287] RSP: 0018:ffff9e60c2e2bdc0 EFLAGS: 00010202
>>>>> [250515.763778] RAX: ffff8ec52a6fe1d0 RBX: ffff8ec526fce800 RCX: dead000000000200
>>>>> [250515.764060] 00000010: 0000000d 00000000 00000db4 00000000  ................
>>>>> [250515.764310] RDX: ffff8ec52d3e6800 RSI: 0000000000000246 RDI: ffff8ec52a6fe198
>>>>> [250515.764311] RBP: ffff8ec52d3e6800 R08: 0000000000000002 R09: ffff8ec52a6fe170
>>>>> [250515.764312] R10: ffff9e60c2e2bc10 R11: ffff8ec527088000 R12: ffff8ec526fce9c0
>>>>> [250515.764314] R13: ffff8ec52d3e6800 R14: ffff9e60c2e2bdd8 R15: ffff8ec52d3e6b80
>>>>> [250515.765491] 00000020: 0000025c 00000001 640002e9 0000c81d  \..........d....
>>>>> [250515.765836] FS:  0000000000000000(0000) GS:ffff8ec52fa80000(0000) knlGS:0000000000000000
>>>>> [250515.767899] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [250515.768367] CR2: ffff8ec52a6fe1d0 CR3: 000000042a53c005 CR4: 00000000003606e0
>>>>> [250515.768919] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>> [250515.769419] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>> [250515.769591] 00000030: 879543e1 572c0574 e53861ba 97ccc40b  .C..t.,W.a8.....
>>>>> [271909.743603] CIFS VFS: Send error in SessSetup = -126
>>>>> [358301.908585] CIFS VFS: Send error in SessSetup = -126
>>>>> [444693.566453] CIFS VFS: Send error in SessSetup = -126
>>>>> [531086.090040] CIFS VFS: Send error in SessSetup = -126
>>>>> [617477.785390] CIFS VFS: Send error in SessSetup = -126
>>>>> [703869.556705] CIFS VFS: Send error in SessSetup = -126
>>>>> [790261.656853] CIFS VFS: Send error in SessSetup = -126
>>>>> [876653.496928] CIFS VFS: Send error in SessSetup = -126
>>>>> [963045.816742] CIFS VFS: Send error in SessSetup = -126
>>>>> [1049437.566219] CIFS VFS: Send error in SessSetup = -126
>>>>>
>>>>>
>>>>> And from another server:
>>>>> [    4.253091] FS-Cache: Duplicate cookie detected
>>>>> [    4.253120] FS-Cache: O-cookie c=000000004cc29d26 [p=0000000017dbbbc0 fl=222 nc=0 na=1]
>>>>> [    4.253153] FS-Cache: O-cookie d=00000000e8e68765 n=0000000012869fa7
>>>>> [    4.253179] FS-Cache: O-key=[8] '020001bd0a010102'
>>>>> [    4.253201] FS-Cache: N-cookie c=00000000c3bbbddd [p=0000000017dbbbc0 fl=2 nc=0 na=1]
>>>>> [    4.253235] FS-Cache: N-cookie d=00000000e8e68765 n=00000000335882b3
>>>>> [    4.253262] FS-Cache: N-key=[8] '020001bd0a010102'
>>>>> [    4.254107] FS-Cache: Duplicate cookie detected
>>>>> [    4.254130] FS-Cache: O-cookie c=000000004cc29d26 [p=0000000017dbbbc0 fl=222 nc=0 na=1]
>>>>> [    4.254161] FS-Cache: O-cookie d=00000000e8e68765 n=0000000012869fa7
>>>>> [    4.254185] FS-Cache: O-key=[8] '020001bd0a010102'
>>>>> [    4.254230] FS-Cache: N-cookie c=000000000ec2f0bb [p=0000000017dbbbc0 fl=2 nc=0 na=1]
>>>>> [    4.254261] FS-Cache: N-cookie d=00000000e8e68765 n=0000000024706210
>>>>> [    4.254285] FS-Cache: N-key=[8] '020001bd0a010102'
>>>>> [    4.329147] CIFS VFS: BAD_NETWORK_NAME: \\stor02.bk.prodrive.nl\userdata$
>>>>> [    4.330107] CIFS VFS: cifs_mount failed w/return code = -2
>>>>> [65206.127542] CIFS VFS: Send error in SessSetup = -126
>>>>> [151597.808064] CIFS VFS: Send error in SessSetup = -126
>>>>> [237989.956447] CIFS VFS: Send error in SessSetup = -126
>>>>> [324380.937984] CIFS VFS: Send error in SessSetup = -126
>>>>> [402750.869518] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
>>>>> [402750.869594] BUG: unable to handle kernel paging request at ffff909bf61609d0
>>>>> [402750.869650] PGD 1ac02067 P4D 1ac02067 PUD 1385f5063 PMD 136142063 PTE 8000000136160063
>>>>> [402750.869716] Oops: 0011 [#1] PREEMPT SMP PTI
>>>>> [402750.869753] CPU: 0 PID: 797 Comm: cifsd Tainted: G            E     4.20.17-pd-4.20.y #20190611
>>>>> [402750.869818] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/19/2018
>>>>> [402750.869926] CIFS VFS: No task to wake, unknown frame received! NumMids 3
>>>>> [402750.869947] RIP: 0010:0xffff909bf61609d0
>>>>> [402750.870013] 00000000: 424d53fe 00010040 00000000 00000005  .SMB@...........
>>>>> [402750.870198] Code: ff ff 00 00 00 00 4e 01 00 00 00 7d 00 00 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> 09 16 f6 9b 90 ff ff d0 09 16 f6 9b 90 ff ff c0 40 1f f2 9b 90
>>>>> [402750.870275] 00000010: 10000009 00000098 000002b5 00000000  ................
>>>>> [402750.870450] RSP: 0000:ffffa18d41adbd28 EFLAGS: 00010202
>>>>> [402750.870469] 00000020: 000001cd 00000001 44000cd9 0000c824  ...........D$...
>>>>> [402750.870509] RAX: ffff909bf61609d0 RBX: ffff909bf6c3f800 RCX: dead000000000200
>>>>> [402750.870511] RDX: ffff909bf74fa980 RSI: 0000000000000246 RDI: ffff909bf6160998
>>>>> [402750.870530] 00000030: 79955e4a 0dacb8eb 025edebd 4efa7788  J^.y......^..w.N
>>>>> [402750.870582] RBP: ffff909bf74fa980 R08: 0000000000000000 R09: ffff909bf6c3f970
>>>>> [402750.870585] R10: ffffa18d406a7cc0 R11: 0000000000000000 R12: ffff909bf6c3f9c0
>>>>> [402750.870605] CIFS VFS: No task to wake, unknown frame received! NumMids 3
>>>>> [402750.870656] R13: ffff909bf74fa980 R14: ffffa18d41adbd40 R15: ffff909bf74fa100
>>>>> [402750.870660] FS:  0000000000000000(0000) GS:ffff909bf9c00000(0000) knlGS:0000000000000000
>>>>> [402750.870677] 00000000: 424d53fe 00010040 00000000 00000010  .SMB@...........
>>>>> [402750.870731] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [402750.870733] CR2: ffff909bf61609d0 CR3: 0000000136e56004 CR4: 00000000003606f0
>>>>> [402750.870752] 00000010: 0000000d 00000068 000002b6 00000000  ....h...........
>>>>> [402750.870888] 00000020: 000001cd 00000001 44000cd9 0000c824  ...........D$...
>>>>> [402750.870894] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>> [402750.870907] 00000030: f063942a cae9f6e1 d5327c26 91fc6f33  *.c.....&|2.3o..
>>>>> [402750.872321] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>> [402750.872324] Call Trace:
>>>>> [402750.872381]  ? cifs_reconnect+0x337/0x880 [cifs]
>>>>> [402750.872410]  ? cifs_readv_from_socket+0x211/0x260 [cifs]
>>>>> [402750.875687]  ? cifs_read_from_socket+0x4a/0x70 [cifs]
>>>>> [402750.876128]  ? _raw_spin_unlock_irqrestore+0x20/0x40
>>>>> [402750.876555]  ? try_to_wake_up+0x54/0x540
>>>>> [402750.876991]  ? cifs_small_buf_get+0x16/0x20 [cifs]
>>>>> [402750.877426]  ? cifs_demultiplex_thread+0xdd/0xbc0 [cifs]
>>>>> [402750.877834]  ? finish_task_switch+0x7d/0x290
>>>>> [402750.878252]  ? cifs_handle_standard+0x190/0x190 [cifs]
>>>>> [402750.878651]  ? kthread+0xf8/0x130
>>>>> [402750.879034]  ? kthread_create_worker_on_cpu+0x70/0x70
>>>>> [402750.879412]  ? ret_from_fork+0x35/0x40
>>>>> [402750.879786] Modules linked in: cpufreq_conservative(E) cpufreq_powersave(E) cpufreq_userspace(E) arc4(E) ecb(E) md4(E) sha512_ssse3(E) sha512_generic(E) cmac(E) hmac(E) nfsv3(E) nfs_acl(E) nls_utf8(E) nfs(E) lockd(E) grace(E) cifs(E) ccm(E) dns_resolver(E) fscache(E) sb_edac(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) aes_x86_64(E) crypto_simd(E) joydev(E) cryptd(E) vmw_balloon(E) evdev(E) glue_helper(E) serio_raw(E) vmwgfx(E) ttm(E) drm_kms_helper(E) sg(E) drm(E) vmw_vsock_vmci_transport(E) vsock(E) vmw_vmci(E) ac(E) button(E) auth_rpcgss(E) sunrpc(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sd_mod(E) ata_generic(E) ata_piix(E) vmw_pvscsi(E) libata(E) crc32c_intel(E) psmouse(E) vmxnet3(E) i2c_piix4(E) scsi_mod(E)
>>>>> [402750.883384] CR2: ffff909bf61609d0
>>>>> [402750.883783] ---[ end trace 08b06875e82513eb ]---
>>>>> [402750.884200] RIP: 0010:0xffff909bf61609d0
>>>>> [402750.884598] Code: ff ff 00 00 00 00 52 01 00 00 00 7d 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> 09 16 f6 9b 90 ff ff d0 09 16 f6 9b 90 ff ff c0 40 1f f2 9b 90
>>>>> [402750.885822] RSP: 0000:ffffa18d41adbd28 EFLAGS: 00010202
>>>>> [402750.886229] RAX: ffff909bf61609d0 RBX: ffff909bf6c3f800 RCX: dead000000000200
>>>>> [402750.886651] RDX: ffff909bf74fa980 RSI: 0000000000000246 RDI: ffff909bf6160998
>>>>> [402750.887065] RBP: ffff909bf74fa980 R08: 0000000000000000 R09: ffff909bf6c3f970
>>>>> [402750.887478] R10: ffffa18d406a7cc0 R11: 0000000000000000 R12: ffff909bf6c3f9c0
>>>>> [402750.887883] R13: ffff909bf74fa980 R14: ffffa18d41adbd40 R15: ffff909bf74fa100
>>>>> [402750.888285] FS:  0000000000000000(0000) GS:ffff909bf9c00000(0000) knlGS:0000000000000000
>>>>> [402750.888692] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [402750.889093] CR2: ffff909bf61609d0 CR3: 0000000136e56004 CR4: 00000000003606f0
>>>>> [402750.889559] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>> [402750.889972] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>> [410772.901264] CIFS VFS: Send error in SessSetup = -126
>>>>> [497165.194662] CIFS VFS: Send error in SessSetup = -126
>>>>> [583557.492304] CIFS VFS: Send error in SessSetup = -126
>>>>> [669948.937016] CIFS VFS: Send error in SessSetup = -126
>>>>> [756341.390112] CIFS VFS: Send error in SessSetup = -126
>>>>> [842733.557002] CIFS VFS: Send error in SessSetup = -126
>>>>> [929125.303428] CIFS VFS: Send error in SessSetup = -126
>>>>> [1015516.629380] CIFS VFS: Send error in SessSetup = -126
>>>>> [1101908.464697] CIFS VFS: Send error in SessSetup = -126
>>>>> [1188300.531261] CIFS VFS: Send error in SessSetup = -126
>>>>> [1274692.583049] CIFS VFS: Send error in SessSetup = -126
>>>>>
>>>>>
>>>>>>
>>>>>> The individual stack dumps are pretty useful. Here is my theory:
>>>>>>
>>>>>>> pid: 9505
>>>>>>> syscall: 4 0x56550a2ec470 0x7ffede42e9a0 0x7ffede42e9a0 0x83a 0x3 0x20
>>>>>>> 0x7ffede42e8f8 0x7f7f8928f295
>>>>>>> [<0>] open_shroot+0x43/0x200 [cifs]
>>>>>>> [<0>] smb2_query_path_info+0x93/0x220 [cifs]
>>>>>>
>>>>>> Almost all of the processes have the same stack trace. They are stuck at
>>>>>> open_shroot()+0x43 which is probably
>>>>>>
>>>>>>        mutex_lock(&tcon->crfid.fid_mutex);
>>>>>>
>>>>>> Then there are only 2 other processes stuck somewhere in the same code path
>>>>>> (open_shroot) but deeper, meaning they have the locks that the other
>>>>>> processes are waiting for:
>>>>>>
>>>>>>
>>>>>>> pid: 22858
>>>>>>> syscall: 4 0x564b46285d10 0x7ffcea3f9a80 0x7ffcea3f9a80 0x83a 0x3 0x20
>>>>>>> 0x7ffcea3f99d8 0x7f6cc78c7295
>>>>>>> [<0>] cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
>>>>>>> [<0>] smb2_reconnect+0x2d6/0x4b0 [cifs]
>>>>>>> [<0>] smb2_plain_req_init+0x30/0x240 [cifs]
>>>>>>> [<0>] SMB2_open_init+0x6d/0x7c0 [cifs]
>>>>>>> [<0>] SMB2_open+0x150/0x520 [cifs]
>>>>>>> [<0>] open_shroot+0x12f/0x200 [cifs]
>>>>>>> [<0>] smb2_query_path_info+0x93/0x220 [cifs]
>>>>>>> [<0>] cifs_get_inode_info+0x580/0xb10 [cifs]
>>>>>>> [<0>] cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
>>>>>>> [<0>] cifs_getattr+0x5b/0x1b0 [cifs]
>>>>>>> [<0>] vfs_statx+0x89/0xe0
>>>>>>> [<0>] __do_sys_newstat+0x39/0x70
>>>>>>> [<0>] do_syscall_64+0x55/0x100
>>>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>>>> [<0>] 0xffffffffffffffff
>>>>>>
>>>>>>
>>>>>>> pid: 20027
>>>>>>> syscall: 4 0x55a3c7d767d0 0x7ffe51432ab0 0x7ffe51432ab0 0x83a
>>>>>>> 0x55a3c7d75c40 0x20 0x7ffe51432a08 0x7f5f7c4e7295
>>>>>>> [<0>] cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
>>>>>>> [<0>] smb2_reconnect+0x2d6/0x4b0 [cifs]
>>>>>>> [<0>] smb2_plain_req_init+0x30/0x240 [cifs]
>>>>>>> [<0>] SMB2_open_init+0x6d/0x7c0 [cifs]
>>>>>>> [<0>] SMB2_open+0x150/0x520 [cifs]
>>>>>>> [<0>] open_shroot+0x12f/0x200 [cifs]
>>>>>>> [<0>] smb2_query_path_info+0x93/0x220 [cifs]
>>>>>>> [<0>] cifs_get_inode_info+0x580/0xb10 [cifs]
>>>>>>> [<0>] cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
>>>>>>> [<0>] cifs_getattr+0x5b/0x1b0 [cifs]
>>>>>>> [<0>] vfs_statx+0x89/0xe0
>>>>>>> [<0>] __do_sys_newstat+0x39/0x70
>>>>>>> [<0>] do_syscall_64+0x55/0x100
>>>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>>>> [<0>] 0xffffffffffffffff
>>>>>>
>>>>>> Due to timeouts maybe the Open request needs
>>>>>> to reconnect the server/ses/tcon and to do this it calls
>>>>>> cifs_mark_open_files_invalid() and gets stuck somewhere there.
>>>>>>
>>>>>>            spin_lock(&tcon->open_file_lock);
>>>>>>            list_for_each_safe(tmp, tmp1, &tcon->openFileList) {
>>>>>>                    open_file = list_entry(tmp, struct cifsFileInfo, tlist);
>>>>>>                    open_file->invalidHandle = true;
>>>>>>                    open_file->oplock_break_cancelled = true;
>>>>>>            }
>>>>>>            spin_unlock(&tcon->open_file_lock);
>>>>>>
>>>>>>            mutex_lock(&tcon->crfid.fid_mutex); <=== most likely here
>>>>>>            tcon->crfid.is_valid = false;
>>>>>>            memset(tcon->crfid.fid, 0, sizeof(struct cifs_fid));
>>>>>>            mutex_unlock(&tcon->crfid.fid_mutex);
>>>>>>
>>>>>> I think these processes are trying to lock the same lock twice: one in
>>>>>> open_shroot() and since Open ends up having to reconnect, once again in
>>>>>> mark_open_files_invalid(). I think it's the same lock because I don't
>>>>>> see why the tcon pointers would be different in those 2 spots. Kernel
>>>>>> mutexes are not reentrant so this is a deadlock.
>>>>>
>>>>> Is there anything we can do about this? Is this maybe already fixed in newer kernels?
>>>>>
>>>>> Regards, Martijn
>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>
>>>>> --
>>>>> Martijn de Gouw
>>>>> Designer
>>>>> Prodrive Technologies
>>>>> Mobile: +31 63 17 76 161
>>>>> Phone:  +31 40 26 76 200
>>>
>>> --
>>> Martijn de Gouw
>>> Designer
>>> Prodrive Technologies
>>> Mobile: +31 63 17 76 161
>>> Phone:  +31 40 26 76 200
>>
>>
>>
>> --
>> Thanks,
>>
>> Steve

-- 
Martijn de Gouw
Designer
Prodrive Technologies
Mobile: +31 63 17 76 161
Phone:  +31 40 26 76 200 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Many processes end up in uninterruptible sleep accessing cifs mounts
  2019-07-09  8:48                 ` Martijn de Gouw
@ 2019-07-15 23:40                   ` Pavel Shilovsky
  2019-08-07  7:34                     ` Martijn de Gouw
  0 siblings, 1 reply; 16+ messages in thread
From: Pavel Shilovsky @ 2019-07-15 23:40 UTC (permalink / raw)
  To: Martijn de Gouw; +Cc: Steve French, Aurélien Aptel, linux-cifs

Thanks for letting us know.

This looks like a memory problem - similar to the one reported in
"General protection faults in cifs_reconnect..." thread. It doesn't
seem like encryption is used here, so the problem is somewhere else.

From the stack trace the kernel is 4.20.17, so it has many stable
patches merged from the crediting series. Could you try older kernels?
Like 4.20.2, 4.20.5 and 4.20.7 (all 3h have different set of fixes
applied).

--
Best regards,
Pavel Shilovsky

вт, 9 июл. 2019 г. в 01:48, Martijn de Gouw
<martijn.de.gouw@prodrive-technologies.com>:
>
> Hi,
>
> Here is another kernel log, but it looks like a different issue.
> All cifs mounts have the nohandlecache option set.
> This server is running the kernel without the V2 patch:
>
> [706049.499153] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
> [706049.499183] BUG: unable to handle kernel paging request at ffff8e6381d549d0
> [706049.499222] PGD 175c02067 P4D 175c02067 PUD 262470063 PMD 8000000101c000e3
> [706049.499245] Oops: 0011 [#1] PREEMPT SMP PTI
> [706049.499258] CPU: 0 PID: 77140 Comm: cifsd Tainted: G            E     4.20.17-pd-4.20.y #20190611
> [706049.499280] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/19/2018
> [706049.499320] RIP: 0010:0xffff8e6381d549d0
> [706049.499332] Code: ff ff 00 00 00 00 e2 04 00 00 00 7d 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> 49 d5 81 63 8e ff ff d0 49 d5 81 63 8e ff ff 00 91 36 88 63 8e
> [706049.499374] RSP: 0018:ffff9d2bd0c8bdc0 EFLAGS: 00010202
> [706049.499388] RAX: ffff8e6381d549d0 RBX: ffff8e659b599000 RCX: dead000000000200
> [706049.499405] RDX: ffff8e66a9fe4280 RSI: 0000000000000246 RDI: ffff8e6381d54998
> [706049.499422] RBP: ffff8e66a9fe4280 R08: 0000000000000000 R09: ffff8e6381d54970
> [706049.499439] R10: ffff9d2bd0c8bc10 R11: ffff8e66a346c000 R12: ffff8e659b5991c0
> [706049.499456] R13: ffff8e66a9fe4280 R14: ffff9d2bd0c8bdd8 R15: ffff8e66a9fe4900
> [706049.499473] FS:  0000000000000000(0000) GS:ffff8e66adc00000(0000) knlGS:0000000000000000
> [706049.499495] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [706049.499516] CR2: ffff8e6381d549d0 CR3: 0000000427928002 CR4: 00000000003606f0
> [706049.499589] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [706049.499607] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [706049.499625] Call Trace:
> [706049.499934] CIFS VFS: No task to wake, unknown frame received! NumMids 3
> [706049.500076]  ? cifs_reconnect+0x337/0x880 [cifs]
> [706049.500764] 00000000: 424d53fe 00010040 00000000 00000005  .SMB@...........
> [706049.501157]  ? cifs_handle_standard+0x169/0x190 [cifs]
> [706049.501166]  ? cifs_demultiplex_thread+0x9e5/0xbc0 [cifs]
> [706049.501867] 00000010: 00000001 00000098 0000046b 00000000  ........k.......
> [706049.502259]  ? finish_task_switch+0x7d/0x290
> [706049.502980] 00000020: 000002c3 00000001 00000905 00007400  .............t..
> [706049.503352]  ? cifs_handle_standard+0x190/0x190 [cifs]
> [706049.504057] 00000030: 00000000 00000000 00000000 00000000  ................
> [706049.504475]  ? kthread+0xf8/0x130
> [706049.504478]  ? kthread_create_worker_on_cpu+0x70/0x70
> [706049.505247] CIFS VFS: No task to wake, unknown frame received! NumMids 3
> [706049.505681]  ? ret_from_fork+0x35/0x40
> [706049.506498] 00000000: 424d53fe 00010040 00000000 00000010  .SMB@...........
> [706049.506926] Modules linked in: dm_snapshot(E) dm_bufio(E) cpufreq_userspace(E) cpufreq_conservative(E) cpufreq_powersave(E) arc4(E) ecb(E) nfsv3(E) md4(E) nfs_acl(E) sha512_ssse3(E) nfs(E) sha512_generic(E) cmac(E) hmac(E) lockd(E) nls_utf8(E) cifs(E) grace(E) ccm(E) dns_resolver(E) fscache(E) vmw_vsock_vmci_transport(E) vsock(E) sb_edac(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) vmwgfx(E) ttm(E) aesni_intel(E) drm_kms_helper(E) aes_x86_64(E) crypto_simd(E) sg(E) cryptd(E) joydev(E) evdev(E) vmw_balloon(E) drm(E) glue_helper(E) vmw_vmci(E) serio_raw(E) ac(E) button(E) auth_rpcgss(E) sunrpc(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) dm_mod(E) sr_mod(E) cdrom(E) ata_generic(E) sd_mod(E) ata_piix(E) libata(E) vmw_pvscsi(E) crc32c_intel(E) psmouse(E) e1000e(E) scsi_mod(E) i2c_piix4(E)
> [706049.507729] 00000010: 00000005 00000068 0000046c 00000000  ....h...l.......
> [706049.508166] CR2: ffff8e6381d549d0
> [706049.508965] 00000020: 000002c3 00000001 00000905 00007400  .............t..
> [706049.513140] ---[ end trace e371dfb71e7921db ]---
> [706049.513143] RIP: 0010:0xffff8e6381d549d0
> [706049.513144] Code: ff ff 00 00 00 00 e2 04 00 00 00 7d 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> 49 d5 81 63 8e ff ff d0 49 d5 81 63 8e ff ff 00 91 36 88 63 8e
> [706049.513145] RSP: 0018:ffff9d2bd0c8bdc0 EFLAGS: 00010202
> [706049.513147] RAX: ffff8e6381d549d0 RBX: ffff8e659b599000 RCX: dead000000000200
> [706049.513147] RDX: ffff8e66a9fe4280 RSI: 0000000000000246 RDI: ffff8e6381d54998
> [706049.513148] RBP: ffff8e66a9fe4280 R08: 0000000000000000 R09: ffff8e6381d54970
> [706049.513148] R10: ffff9d2bd0c8bc10 R11: ffff8e66a346c000 R12: ffff8e659b5991c0
> [706049.513149] R13: ffff8e66a9fe4280 R14: ffff9d2bd0c8bdd8 R15: ffff8e66a9fe4900
> [706049.513150] FS:  0000000000000000(0000) GS:ffff8e66adc00000(0000) knlGS:0000000000000000
> [706049.513151] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [706049.513151] CR2: ffff8e6381d549d0 CR3: 0000000427928002 CR4: 00000000003606f0
> [706049.513201] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [706049.521671] 00000030: 00000000 00000000 00000000 00000000  ................
> [706049.522058] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [706049.523846] CIFS VFS: No task to wake, unknown frame received! NumMids 3
> [706049.524699] 00000000: 424d53fe 00010040 00000000 00060006  .SMB@...........
> [706049.525574] 00000010: 00000005 00000000 0000046d 00000000  ........m.......
> [706049.526449] 00000020: 000002c3 00000001 00000905 00007400  .............t..
> [706049.527304] 00000030: 00000000 00000000 00000000 00000000  ................
> [707493.825696] CIFS VFS: Server stor02.bk.prodrive.nl has not responded in 120 seconds. Reconnecting...
> [707615.670866] CIFS VFS: Server stor02.bk.prodrive.nl has not responded in 120 seconds. Reconnecting...
> [707737.516154] CIFS VFS: Server stor02.bk.prodrive.nl has not responded in 120 seconds. Reconnecting...
>
> Regards, Martijn
>
> On 06-07-2019 18:40, Pavel Shilovsky wrote:
> > Martijn, thanks for the traces. This is a deadlock as mentioned by Aurelien:
> >
> > [36247.368103] INFO: task puppet:3836 blocked for more than 120 seconds.
> > [36247.368119]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
> > [36247.368135] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [36247.368158] puppet          D    0  3836    541 0x00000000
> > [36247.368159] Call Trace:
> > [36247.368161]  ? __schedule+0x338/0x850
> > [36247.368163]  schedule+0x3c/0x90
> > [36247.368164]  schedule_preempt_disabled+0x14/0x20
> > [36247.368165]  __mutex_lock.isra.7+0x19f/0x540
> > [36247.368166]  ? __switch_to_asm+0x34/0x70
> > [36247.368167]  ? __switch_to_asm+0x40/0x70
> > [36247.368169]  ? try_module_get+0x68/0x100
> > [36247.368178]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
> > [36247.368186]  smb2_reconnect+0x1d7/0x4b0 [cifs]
> > [36247.368188]  ? _raw_spin_unlock_irq+0x1d/0x30
> > [36247.368189]  ? finish_task_switch+0x7d/0x290
> > [36247.368190]  ? __switch_to_asm+0x40/0x70
> > [36247.368198]  smb2_plain_req_init+0x30/0x240 [cifs]
> > [36247.368199]  ? _raw_spin_lock_irqsave+0x25/0x50
> > [36247.368208]  SMB2_open_init+0x6d/0x7c0 [cifs]
> > [36247.368209]  ? _raw_spin_lock+0x13/0x30
> > [36247.368218]  ? cifs_mid_q_entry_release+0x15/0x40 [cifs]
> > [36247.368227]  ? SMB2_open+0x150/0x520 [cifs]
> > [36247.368235]  SMB2_open+0x150/0x520 [cifs]
> > [36247.368238]  ? sched_clock+0x5/0x10
> > [36247.368247]  ? open_shroot+0x12f/0x200 [cifs]
> > [36247.368255]  open_shroot+0x12f/0x200 [cifs]
> > [36247.368258]  ? __follow_mount_rcu.isra.31+0x3c/0xf0
> > [36247.368267]  smb2_query_path_info+0x93/0x220 [cifs]
> > [36247.368269]  ? walk_component+0x48/0x360
> > [36247.368278]  cifs_get_inode_info+0x580/0xb10 [cifs]
> > [36247.368286]  ? filename_lookup+0xf8/0x1a0
> > [36247.368294]  ? build_path_from_dentry_optional_prefix+0x1e9/0x440 [cifs]
> > [36247.368303]  cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
> > [36247.368311]  cifs_getattr+0x5b/0x1b0 [cifs]
> > [36247.368313]  vfs_statx+0x89/0xe0
> > [36247.368315]  __do_sys_newlstat+0x39/0x70
> > [36247.368318]  do_syscall_64+0x55/0x100
> > [36247.368320]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [36247.368321] RIP: 0033:0x7f562d3a0335
> > [36247.368323] Code: Bad RIP value.
> >
> > This looks urgent to fix - we have a parallel thread with the similar
> > problem observed on 5.1 kernel. In order to fix it we need to drop
> > open shroot mutex before going into SMB2_open_init and other PDU
> > functions.
> >
> > You can try to work around this by specifying "nohandlecache" mount option.
> >
> > --
> > Best regards,
> > Pavel Shilovsky
> >
> > сб, 6 июл. 2019 г. в 06:05, Steve French <smfrench@gmail.com>:
> >>
> >> Are we sure this isn't something really simple like the server IP
> >> address changing?  Remember in 5.0 kernel (also updated cifs-utils)
> >> the code to allow failover when server's IP address changes was added.
> >>
> >> On Sat, Jul 6, 2019 at 6:12 AM Martijn de Gouw
> >> <martijn.de.gouw@prodrive-technologies.com> wrote:
> >>>
> >>> Hi Pavel,
> >>>
> >>> On 04-07-2019 23:34, Pavel Shilovsky wrote:
> >>>> Hi Martijn,
> >>>>
> >>>> Thanks for reporting the problem. Have you tried v5.1.5 kernel and
> >>>> above? That one has many reconnect fixes. If yes, could you please
> >>>> provide the kernel stack / panic traces when running new versions of
> >>>> the kernel?
> >>>
> >>> I can try to run them for a while on a less critical server. But even if that machine runs fine, there is not really a guarantee the cifs issue we have right now is solved for our critical machines, causing me to be very hesitated to upgrade those.
> >>>
> >>> Btw, here is another kernel dump, generated today. Please note that on this machine the cifs mounts are not really used, all 'action'
> >>> performed on these mounts is the check if they are still mounted (by puppet).
> >>>
> >>>
> >>> [36247.367572] INFO: task node_exporter:3567 blocked for more than 120 seconds.
> >>> [36247.367613]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
> >>> [36247.367630] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >>> [36247.367648] node_exporter   D    0  3567      1 0x00000000
> >>> [36247.367650] Call Trace:
> >>> [36247.367668]  ? __schedule+0x338/0x850
> >>> [36247.367671]  ? preempt_count_add+0x67/0xb0
> >>> [36247.367673]  schedule+0x3c/0x90
> >>> [36247.367675]  schedule_preempt_disabled+0x14/0x20
> >>> [36247.367676]  __mutex_lock.isra.7+0x19f/0x540
> >>> [36247.367680]  ? try_module_get+0x68/0x100
> >>> [36247.367792]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
> >>> [36247.367803]  smb2_reconnect+0x1d7/0x4b0 [cifs]
> >>> [36247.367824]  smb2_plain_req_init+0x30/0x240 [cifs]
> >>> [36247.367826]  ? __switch_to_asm+0x34/0x70
> >>> [36247.367827]  ? __switch_to_asm+0x40/0x70
> >>> [36247.367836]  SMB2_open_init+0x6d/0x7c0 [cifs]
> >>> [36247.367846]  ? smb2_queryfs+0x13d/0x350 [cifs]
> >>> [36247.367848]  ? lookup_fast+0xc8/0x2e0
> >>> [36247.367857]  smb2_queryfs+0x13d/0x350 [cifs]
> >>> [36247.367861]  ? lookup_fast+0xc8/0x2e0
> >>> [36247.367863]  ? ___cache_free+0x31/0x2e0
> >>> [36247.367865]  ? terminate_walk+0x93/0x100
> >>> [36247.367873]  ? cifs_statfs+0xad/0x300 [cifs]
> >>> [36247.367880]  cifs_statfs+0xad/0x300 [cifs]
> >>> [36247.367894]  statfs_by_dentry+0x6a/0x90
> >>> [36247.367895]  vfs_statfs+0x16/0xc0
> >>> [36247.367897]  user_statfs+0x50/0xa0
> >>> [36247.367898]  __do_sys_statfs+0x20/0x50
> >>> [36247.367902]  do_syscall_64+0x55/0x100
> >>> [36247.367903]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>> [36247.367905] RIP: 0033:0x4a5c20
> >>> [36247.367936] Code: Bad RIP value.
> >>> [36247.367937] RSP: 002b:000000c0003ef490 EFLAGS: 00000206 ORIG_RAX: 0000000000000089
> >>> [36247.367938] RAX: ffffffffffffffda RBX: 000000c00002a000 RCX: 00000000004a5c20
> >>> [36247.367939] RDX: 0000000000000000 RSI: 000000c0003ef5c0 RDI: 000000c0004510e0
> >>> [36247.367940] RBP: 000000c0003ef4f0 R08: 0000000000000000 R09: 0000000000000000
> >>> [36247.367940] R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff
> >>> [36247.367941] R13: 0000000000000088 R14: 0000000000000087 R15: 0000000000000100
> >>> [36247.367954] INFO: task kworker/1:2:32265 blocked for more than 120 seconds.
> >>> [36247.367973]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
> >>> [36247.367989] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >>> [36247.368008] kworker/1:2     D    0 32265      2 0x80000000
> >>> [36247.368020] Workqueue: cifsiod smb2_reconnect_server [cifs]
> >>> [36247.368021] Call Trace:
> >>> [36247.368023]  ? __schedule+0x338/0x850
> >>> [36247.368025]  schedule+0x3c/0x90
> >>> [36247.368026]  schedule_preempt_disabled+0x14/0x20
> >>> [36247.368028]  __mutex_lock.isra.7+0x19f/0x540
> >>> [36247.368037]  ? SMB2_sess_auth_rawntlmssp_authenticate+0x69/0x170 [cifs]
> >>> [36247.368046]  ? SMB2_sess_setup+0x17f/0x2c0 [cifs]
> >>> [36247.368054]  ? cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
> >>> [36247.368055]  ? _raw_spin_unlock+0x16/0x30
> >>> [36247.368062]  cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
> >>> [36247.368071]  smb2_reconnect+0x2d6/0x4b0 [cifs]
> >>> [36247.368073]  ? _raw_spin_unlock+0x16/0x30
> >>> [36247.368074]  ? preempt_count_add+0x67/0xb0
> >>> [36247.368083]  smb2_reconnect_server+0x1bb/0x350 [cifs]
> >>> [36247.368090]  process_one_work+0x188/0x3a0
> >>> [36247.368092]  worker_thread+0x4c/0x3a0
> >>> [36247.368094]  ? preempt_count_add+0x67/0xb0
> >>> [36247.368095]  ? process_one_work+0x3a0/0x3a0
> >>> [36247.368097]  kthread+0xf8/0x130
> >>> [36247.368099]  ? kthread_create_worker_on_cpu+0x70/0x70
> >>> [36247.368101]  ret_from_fork+0x35/0x40
> >>> [36247.368103] INFO: task puppet:3836 blocked for more than 120 seconds.
> >>> [36247.368119]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
> >>> [36247.368135] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >>> [36247.368158] puppet          D    0  3836    541 0x00000000
> >>> [36247.368159] Call Trace:
> >>> [36247.368161]  ? __schedule+0x338/0x850
> >>> [36247.368163]  schedule+0x3c/0x90
> >>> [36247.368164]  schedule_preempt_disabled+0x14/0x20
> >>> [36247.368165]  __mutex_lock.isra.7+0x19f/0x540
> >>> [36247.368166]  ? __switch_to_asm+0x34/0x70
> >>> [36247.368167]  ? __switch_to_asm+0x40/0x70
> >>> [36247.368169]  ? try_module_get+0x68/0x100
> >>> [36247.368178]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
> >>> [36247.368186]  smb2_reconnect+0x1d7/0x4b0 [cifs]
> >>> [36247.368188]  ? _raw_spin_unlock_irq+0x1d/0x30
> >>> [36247.368189]  ? finish_task_switch+0x7d/0x290
> >>> [36247.368190]  ? __switch_to_asm+0x40/0x70
> >>> [36247.368198]  smb2_plain_req_init+0x30/0x240 [cifs]
> >>> [36247.368199]  ? _raw_spin_lock_irqsave+0x25/0x50
> >>> [36247.368208]  SMB2_open_init+0x6d/0x7c0 [cifs]
> >>> [36247.368209]  ? _raw_spin_lock+0x13/0x30
> >>> [36247.368218]  ? cifs_mid_q_entry_release+0x15/0x40 [cifs]
> >>> [36247.368227]  ? SMB2_open+0x150/0x520 [cifs]
> >>> [36247.368235]  SMB2_open+0x150/0x520 [cifs]
> >>> [36247.368238]  ? sched_clock+0x5/0x10
> >>> [36247.368247]  ? open_shroot+0x12f/0x200 [cifs]
> >>> [36247.368255]  open_shroot+0x12f/0x200 [cifs]
> >>> [36247.368258]  ? __follow_mount_rcu.isra.31+0x3c/0xf0
> >>> [36247.368267]  smb2_query_path_info+0x93/0x220 [cifs]
> >>> [36247.368269]  ? walk_component+0x48/0x360
> >>> [36247.368278]  cifs_get_inode_info+0x580/0xb10 [cifs]
> >>> [36247.368286]  ? filename_lookup+0xf8/0x1a0
> >>> [36247.368294]  ? build_path_from_dentry_optional_prefix+0x1e9/0x440 [cifs]
> >>> [36247.368303]  cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
> >>> [36247.368311]  cifs_getattr+0x5b/0x1b0 [cifs]
> >>> [36247.368313]  vfs_statx+0x89/0xe0
> >>> [36247.368315]  __do_sys_newlstat+0x39/0x70
> >>> [36247.368318]  do_syscall_64+0x55/0x100
> >>> [36247.368320]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>> [36247.368321] RIP: 0033:0x7f562d3a0335
> >>> [36247.368323] Code: Bad RIP value.
> >>> [36247.368324] RSP: 002b:00007ffc6bbc4398 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
> >>> [36247.368325] RAX: ffffffffffffffda RBX: 00007ffc6bbc43b0 RCX: 00007f562d3a0335
> >>> [36247.368326] RDX: 00007ffc6bbc43b0 RSI: 00007ffc6bbc43b0 RDI: 000055e4e5e1bd50
> >>> [36247.368326] RBP: 000055e4e2eb65e0 R08: 0000000000000002 R09: 000055e4e2eb69e0
> >>> [36247.368327] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
> >>> [36247.368328] R13: 000055e4e2ef7258 R14: 000055e4e2ef8810 R15: 000055e4e2f92ad0
> >>> [36368.186922] INFO: task node_exporter:3567 blocked for more than 120 seconds.
> >>> [36368.186988]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
> >>> [36368.187039] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >>> [36368.187097] node_exporter   D    0  3567      1 0x00000000
> >>> [36368.187102] Call Trace:
> >>> [36368.187117]  ? __schedule+0x338/0x850
> >>> [36368.187123]  ? preempt_count_add+0x67/0xb0
> >>> [36368.187128]  schedule+0x3c/0x90
> >>> [36368.187133]  schedule_preempt_disabled+0x14/0x20
> >>> [36368.187138]  __mutex_lock.isra.7+0x19f/0x540
> >>> [36368.187175]  ? try_module_get+0x68/0x100
> >>> [36368.187232]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
> >>> [36368.187265]  smb2_reconnect+0x1d7/0x4b0 [cifs]
> >>> [36368.187300]  smb2_plain_req_init+0x30/0x240 [cifs]
> >>> [36368.187304]  ? __switch_to_asm+0x34/0x70
> >>> [36368.187307]  ? __switch_to_asm+0x40/0x70
> >>> [36368.187337]  SMB2_open_init+0x6d/0x7c0 [cifs]
> >>> [36368.187372]  ? smb2_queryfs+0x13d/0x350 [cifs]
> >>> [36368.187375]  ? lookup_fast+0xc8/0x2e0
> >>> [36368.187405]  smb2_queryfs+0x13d/0x350 [cifs]
> >>> [36368.187410]  ? lookup_fast+0xc8/0x2e0
> >>> [36368.187416]  ? ___cache_free+0x31/0x2e0
> >>> [36368.187421]  ? terminate_walk+0x93/0x100
> >>> [36368.187449]  ? cifs_statfs+0xad/0x300 [cifs]
> >>> [36368.187471]  cifs_statfs+0xad/0x300 [cifs]
> >>> [36368.187477]  statfs_by_dentry+0x6a/0x90
> >>> [36368.187482]  vfs_statfs+0x16/0xc0
> >>> [36368.187486]  user_statfs+0x50/0xa0
> >>> [36368.187491]  __do_sys_statfs+0x20/0x50
> >>> [36368.187499]  do_syscall_64+0x55/0x100
> >>> [36368.187503]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>> [36368.187507] RIP: 0033:0x4a5c20
> >>> [36368.187517] Code: Bad RIP value.
> >>> [36368.187519] RSP: 002b:000000c0003ef490 EFLAGS: 00000206 ORIG_RAX: 0000000000000089
> >>> [36368.187523] RAX: ffffffffffffffda RBX: 000000c00002a000 RCX: 00000000004a5c20
> >>> [36368.187526] RDX: 0000000000000000 RSI: 000000c0003ef5c0 RDI: 000000c0004510e0
> >>> [36368.187528] RBP: 000000c0003ef4f0 R08: 0000000000000000 R09: 0000000000000000
> >>> [36368.187530] R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff
> >>> [36368.187532] R13: 0000000000000088 R14: 0000000000000087 R15: 0000000000000100
> >>> [36368.187572] INFO: task kworker/1:2:32265 blocked for more than 120 seconds.
> >>> [36368.187627]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
> >>> [36368.187677] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >>> [36368.187735] kworker/1:2     D    0 32265      2 0x80000000
> >>> [36368.187771] Workqueue: cifsiod smb2_reconnect_server [cifs]
> >>> [36368.187773] Call Trace:
> >>> [36368.187780]  ? __schedule+0x338/0x850
> >>> [36368.187785]  schedule+0x3c/0x90
> >>> [36368.187790]  schedule_preempt_disabled+0x14/0x20
> >>> [36368.187795]  __mutex_lock.isra.7+0x19f/0x540
> >>> [36368.187826]  ? SMB2_sess_auth_rawntlmssp_authenticate+0x69/0x170 [cifs]
> >>> [36368.187855]  ? SMB2_sess_setup+0x17f/0x2c0 [cifs]
> >>> [36368.187880]  ? cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
> >>> [36368.187885]  ? _raw_spin_unlock+0x16/0x30
> >>> [36368.187908]  cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
> >>> [36368.187936]  smb2_reconnect+0x2d6/0x4b0 [cifs]
> >>> [36368.187942]  ? _raw_spin_unlock+0x16/0x30
> >>> [36368.187947]  ? preempt_count_add+0x67/0xb0
> >>> [36368.187976]  smb2_reconnect_server+0x1bb/0x350 [cifs]
> >>> [36368.187984]  process_one_work+0x188/0x3a0
> >>> [36368.187990]  worker_thread+0x4c/0x3a0
> >>> [36368.187994]  ? preempt_count_add+0x67/0xb0
> >>> [36368.187998]  ? process_one_work+0x3a0/0x3a0
> >>> [36368.188001]  kthread+0xf8/0x130
> >>> [36368.188005]  ? kthread_create_worker_on_cpu+0x70/0x70
> >>> [36368.188009]  ret_from_fork+0x35/0x40
> >>> [36368.188014] INFO: task puppet:3836 blocked for more than 120 seconds.
> >>> [36368.188064]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
> >>> [36368.188114] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >>> [36368.188182] puppet          D    0  3836    541 0x00000000
> >>> [36368.188186] Call Trace:
> >>> [36368.188191]  ? __schedule+0x338/0x850
> >>> [36368.188196]  schedule+0x3c/0x90
> >>> [36368.188200]  schedule_preempt_disabled+0x14/0x20
> >>> [36368.188205]  __mutex_lock.isra.7+0x19f/0x540
> >>> [36368.188208]  ? __switch_to_asm+0x34/0x70
> >>> [36368.188211]  ? __switch_to_asm+0x40/0x70
> >>> [36368.188215]  ? try_module_get+0x68/0x100
> >>> [36368.188244]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
> >>> [36368.188271]  smb2_reconnect+0x1d7/0x4b0 [cifs]
> >>> [36368.188277]  ? _raw_spin_unlock_irq+0x1d/0x30
> >>> [36368.188280]  ? finish_task_switch+0x7d/0x290
> >>> [36368.188283]  ? __switch_to_asm+0x40/0x70
> >>> [36368.188310]  smb2_plain_req_init+0x30/0x240 [cifs]
> >>> [36368.188314]  ? _raw_spin_lock_irqsave+0x25/0x50
> >>> [36368.188342]  SMB2_open_init+0x6d/0x7c0 [cifs]
> >>> [36368.188346]  ? _raw_spin_lock+0x13/0x30
> >>> [36368.188376]  ? cifs_mid_q_entry_release+0x15/0x40 [cifs]
> >>> [36368.188405]  ? SMB2_open+0x150/0x520 [cifs]
> >>> [36368.188432]  SMB2_open+0x150/0x520 [cifs]
> >>> [36368.188440]  ? sched_clock+0x5/0x10
> >>> [36368.188470]  ? open_shroot+0x12f/0x200 [cifs]
> >>> [36368.188497]  open_shroot+0x12f/0x200 [cifs]
> >>> [36368.188503]  ? __follow_mount_rcu.isra.31+0x3c/0xf0
> >>> [36368.188534]  smb2_query_path_info+0x93/0x220 [cifs]
> >>> [36368.188539]  ? walk_component+0x48/0x360
> >>> [36368.188569]  cifs_get_inode_info+0x580/0xb10 [cifs]
> >>> [36368.188574]  ? filename_lookup+0xf8/0x1a0
> >>> [36368.188601]  ? build_path_from_dentry_optional_prefix+0x1e9/0x440 [cifs]
> >>> [36368.188630]  cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
> >>> [36368.188659]  cifs_getattr+0x5b/0x1b0 [cifs]
> >>> [36368.188664]  vfs_statx+0x89/0xe0
> >>> [36368.188670]  __do_sys_newlstat+0x39/0x70
> >>> [36368.188677]  do_syscall_64+0x55/0x100
> >>> [36368.188681]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>> [36368.188684] RIP: 0033:0x7f562d3a0335
> >>> [36368.188689] Code: Bad RIP value.
> >>> [36368.188691] RSP: 002b:00007ffc6bbc4398 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
> >>> [36368.188695] RAX: ffffffffffffffda RBX: 00007ffc6bbc43b0 RCX: 00007f562d3a0335
> >>> [36368.188696] RDX: 00007ffc6bbc43b0 RSI: 00007ffc6bbc43b0 RDI: 000055e4e5e1bd50
> >>> [36368.188698] RBP: 000055e4e2eb65e0 R08: 0000000000000002 R09: 000055e4e2eb69e0
> >>> [36368.188700] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
> >>> [36368.188702] R13: 000055e4e2ef7258 R14: 000055e4e2ef8810 R15: 000055e4e2f92ad0
> >>> [36489.006262] INFO: task node_exporter:3567 blocked for more than 120 seconds.
> >>> [36489.006300]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
> >>> [36489.006329] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >>> [36489.006458] node_exporter   D    0  3567      1 0x00000000
> >>> [36489.006461] Call Trace:
> >>> [36489.006470]  ? __schedule+0x338/0x850
> >>> [36489.006474]  ? preempt_count_add+0x67/0xb0
> >>> [36489.006476]  schedule+0x3c/0x90
> >>> [36489.006478]  schedule_preempt_disabled+0x14/0x20
> >>> [36489.006480]  __mutex_lock.isra.7+0x19f/0x540
> >>> [36489.006485]  ? try_module_get+0x68/0x100
> >>> [36489.006515]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
> >>> [36489.006531]  smb2_reconnect+0x1d7/0x4b0 [cifs]
> >>> [36489.006547]  smb2_plain_req_init+0x30/0x240 [cifs]
> >>> [36489.006549]  ? __switch_to_asm+0x34/0x70
> >>> [36489.006550]  ? __switch_to_asm+0x40/0x70
> >>> [36489.006565]  SMB2_open_init+0x6d/0x7c0 [cifs]
> >>> [36489.006581]  ? smb2_queryfs+0x13d/0x350 [cifs]
> >>> [36489.006583]  ? lookup_fast+0xc8/0x2e0
> >>> [36489.006597]  smb2_queryfs+0x13d/0x350 [cifs]
> >>> [36489.006599]  ? lookup_fast+0xc8/0x2e0
> >>> [36489.006602]  ? ___cache_free+0x31/0x2e0
> >>> [36489.006604]  ? terminate_walk+0x93/0x100
> >>> [36489.006619]  ? cifs_statfs+0xad/0x300 [cifs]
> >>> [36489.006632]  cifs_statfs+0xad/0x300 [cifs]
> >>> [36489.006635]  statfs_by_dentry+0x6a/0x90
> >>> [36489.006637]  vfs_statfs+0x16/0xc0
> >>> [36489.006639]  user_statfs+0x50/0xa0
> >>> [36489.006641]  __do_sys_statfs+0x20/0x50
> >>> [36489.006645]  do_syscall_64+0x55/0x100
> >>> [36489.006647]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>> [36489.006649] RIP: 0033:0x4a5c20
> >>> [36489.006654] Code: Bad RIP value.
> >>> [36489.006655] RSP: 002b:000000c0003ef490 EFLAGS: 00000206 ORIG_RAX: 0000000000000089
> >>> [36489.006657] RAX: ffffffffffffffda RBX: 000000c00002a000 RCX: 00000000004a5c20
> >>> [36489.006658] RDX: 0000000000000000 RSI: 000000c0003ef5c0 RDI: 000000c0004510e0
> >>> [36489.006659] RBP: 000000c0003ef4f0 R08: 0000000000000000 R09: 0000000000000000
> >>> [36489.006660] R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff
> >>> [36489.006661] R13: 0000000000000088 R14: 0000000000000087 R15: 0000000000000100
> >>> [36489.006675] INFO: task kworker/1:2:32265 blocked for more than 120 seconds.
> >>> [36489.006812]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
> >>> [36489.006940] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >>> [36489.007076] kworker/1:2     D    0 32265      2 0x80000000
> >>> [36489.007100] Workqueue: cifsiod smb2_reconnect_server [cifs]
> >>> [36489.007101] Call Trace:
> >>> [36489.007105]  ? __schedule+0x338/0x850
> >>> [36489.007107]  schedule+0x3c/0x90
> >>> [36489.007109]  schedule_preempt_disabled+0x14/0x20
> >>> [36489.007111]  __mutex_lock.isra.7+0x19f/0x540
> >>> [36489.007124]  ? SMB2_sess_auth_rawntlmssp_authenticate+0x69/0x170 [cifs]
> >>> [36489.007134]  ? SMB2_sess_setup+0x17f/0x2c0 [cifs]
> >>> [36489.007141]  ? cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
> >>> [36489.007143]  ? _raw_spin_unlock+0x16/0x30
> >>> [36489.007150]  cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
> >>> [36489.007159]  smb2_reconnect+0x2d6/0x4b0 [cifs]
> >>> [36489.007161]  ? _raw_spin_unlock+0x16/0x30
> >>> [36489.007163]  ? preempt_count_add+0x67/0xb0
> >>> [36489.007172]  smb2_reconnect_server+0x1bb/0x350 [cifs]
> >>> [36489.007176]  process_one_work+0x188/0x3a0
> >>> [36489.007178]  worker_thread+0x4c/0x3a0
> >>> [36489.007179]  ? preempt_count_add+0x67/0xb0
> >>> [36489.007180]  ? process_one_work+0x3a0/0x3a0
> >>> [36489.007181]  kthread+0xf8/0x130
> >>> [36489.007183]  ? kthread_create_worker_on_cpu+0x70/0x70
> >>> [36489.007184]  ret_from_fork+0x35/0x40
> >>> [36489.007186] INFO: task puppet:3836 blocked for more than 120 seconds.
> >>> [36489.007294]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
> >>> [36489.007409] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >>> [36489.007540] puppet          D    0  3836    541 0x00000000
> >>> [36489.007541] Call Trace:
> >>> [36489.007545]  ? __schedule+0x338/0x850
> >>> [36489.007548]  schedule+0x3c/0x90
> >>> [36489.007550]  schedule_preempt_disabled+0x14/0x20
> >>> [36489.007552]  __mutex_lock.isra.7+0x19f/0x540
> >>> [36489.007553]  ? __switch_to_asm+0x34/0x70
> >>> [36489.007555]  ? __switch_to_asm+0x40/0x70
> >>> [36489.007558]  ? try_module_get+0x68/0x100
> >>> [36489.007580]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
> >>> [36489.007594]  smb2_reconnect+0x1d7/0x4b0 [cifs]
> >>> [36489.007596]  ? _raw_spin_unlock_irq+0x1d/0x30
> >>> [36489.007598]  ? finish_task_switch+0x7d/0x290
> >>> [36489.007600]  ? __switch_to_asm+0x40/0x70
> >>> [36489.007613]  smb2_plain_req_init+0x30/0x240 [cifs]
> >>> [36489.007615]  ? _raw_spin_lock_irqsave+0x25/0x50
> >>> [36489.007628]  SMB2_open_init+0x6d/0x7c0 [cifs]
> >>> [36489.007630]  ? _raw_spin_lock+0x13/0x30
> >>> [36489.007644]  ? cifs_mid_q_entry_release+0x15/0x40 [cifs]
> >>> [36489.007658]  ? SMB2_open+0x150/0x520 [cifs]
> >>> [36489.007672]  SMB2_open+0x150/0x520 [cifs]
> >>> [36489.007677]  ? sched_clock+0x5/0x10
> >>> [36489.007692]  ? open_shroot+0x12f/0x200 [cifs]
> >>> [36489.007707]  open_shroot+0x12f/0x200 [cifs]
> >>> [36489.007711]  ? __follow_mount_rcu.isra.31+0x3c/0xf0
> >>> [36489.007727]  smb2_query_path_info+0x93/0x220 [cifs]
> >>> [36489.007729]  ? walk_component+0x48/0x360
> >>> [36489.007745]  cifs_get_inode_info+0x580/0xb10 [cifs]
> >>> [36489.007748]  ? filename_lookup+0xf8/0x1a0
> >>> [36489.007762]  ? build_path_from_dentry_optional_prefix+0x1e9/0x440 [cifs]
> >>> [36489.007778]  cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
> >>> [36489.007791]  cifs_getattr+0x5b/0x1b0 [cifs]
> >>> [36489.007794]  vfs_statx+0x89/0xe0
> >>> [36489.007797]  __do_sys_newlstat+0x39/0x70
> >>> [36489.007801]  do_syscall_64+0x55/0x100
> >>> [36489.007802]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>> [36489.007805] RIP: 0033:0x7f562d3a0335
> >>> [36489.007808] Code: Bad RIP value.
> >>> [36489.007809] RSP: 002b:00007ffc6bbc4398 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
> >>> [36489.007811] RAX: ffffffffffffffda RBX: 00007ffc6bbc43b0 RCX: 00007f562d3a0335
> >>> [36489.007812] RDX: 00007ffc6bbc43b0 RSI: 00007ffc6bbc43b0 RDI: 000055e4e5e1bd50
> >>> [36489.007813] RBP: 000055e4e2eb65e0 R08: 0000000000000002 R09: 000055e4e2eb69e0
> >>> [36489.007814] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
> >>> [36489.007815] R13: 000055e4e2ef7258 R14: 000055e4e2ef8810 R15: 000055e4e2f92ad0
> >>> [36609.825705] INFO: task node_exporter:3567 blocked for more than 120 seconds.
> >>> [36609.825836]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
> >>> [36609.825869] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >>> [36609.825908] node_exporter   D    0  3567      1 0x00000000
> >>> [36609.825910] Call Trace:
> >>> [36609.825919]  ? __schedule+0x338/0x850
> >>> [36609.825922]  ? preempt_count_add+0x67/0xb0
> >>> [36609.825925]  schedule+0x3c/0x90
> >>> [36609.825927]  schedule_preempt_disabled+0x14/0x20
> >>> [36609.825929]  __mutex_lock.isra.7+0x19f/0x540
> >>> [36609.825933]  ? try_module_get+0x68/0x100
> >>> [36609.825960]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
> >>> [36609.825979]  smb2_reconnect+0x1d7/0x4b0 [cifs]
> >>> [36609.825999]  smb2_plain_req_init+0x30/0x240 [cifs]
> >>> [36609.826001]  ? __switch_to_asm+0x34/0x70
> >>> [36609.826002]  ? __switch_to_asm+0x40/0x70
> >>> [36609.826020]  SMB2_open_init+0x6d/0x7c0 [cifs]
> >>> [36609.826039]  ? smb2_queryfs+0x13d/0x350 [cifs]
> >>> [36609.826040]  ? lookup_fast+0xc8/0x2e0
> >>> [36609.826057]  smb2_queryfs+0x13d/0x350 [cifs]
> >>> [36609.826060]  ? lookup_fast+0xc8/0x2e0
> >>> [36609.826064]  ? ___cache_free+0x31/0x2e0
> >>> [36609.826066]  ? terminate_walk+0x93/0x100
> >>> [36609.826084]  ? cifs_statfs+0xad/0x300 [cifs]
> >>> [36609.826099]  cifs_statfs+0xad/0x300 [cifs]
> >>> [36609.826103]  statfs_by_dentry+0x6a/0x90
> >>> [36609.826104]  vfs_statfs+0x16/0xc0
> >>> [36609.826106]  user_statfs+0x50/0xa0
> >>> [36609.826108]  __do_sys_statfs+0x20/0x50
> >>> [36609.826113]  do_syscall_64+0x55/0x100
> >>> [36609.826115]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>> [36609.826117] RIP: 0033:0x4a5c20
> >>> [36609.826123] Code: Bad RIP value.
> >>> [36609.826124] RSP: 002b:000000c0003ef490 EFLAGS: 00000206 ORIG_RAX: 0000000000000089
> >>> [36609.826126] RAX: ffffffffffffffda RBX: 000000c00002a000 RCX: 00000000004a5c20
> >>> [36609.826127] RDX: 0000000000000000 RSI: 000000c0003ef5c0 RDI: 000000c0004510e0
> >>> [36609.826128] RBP: 000000c0003ef4f0 R08: 0000000000000000 R09: 0000000000000000
> >>> [36609.826129] R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff
> >>> [36609.826130] R13: 0000000000000088 R14: 0000000000000087 R15: 0000000000000100
> >>> [73117.654990] CIFS VFS: Send error in SessSetup = -126
> >>>
> >>> Regards, Martijn
> >>>
> >>>>
> >>>> --
> >>>> Best regards,
> >>>> Pavel Shilovsky
> >>>>
> >>>> чт, 4 июл. 2019 г. в 06:36, Martijn de Gouw
> >>>> <martijn.de.gouw@prodrive-technologies.com>:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> On 04-07-2019 13:22, Aurélien Aptel wrote:
> >>>>>> Martijn de Gouw <martijn.de.gouw@prodrive-technologies.com> writes:
> >>>>>>>> Are there any kernel oops/panic with stack traces and register dumps in
> >>>>>>>> the log?
> >>>>>>>>
> >>>>>>>> You can inspect the kernel stack trace of the hung processes (to see where
> >>>>>>>> they are stuck) by printing the file /proc/<pid>/stack.
> >>>>>>>
> >>>>>>> These are the stacks of all processes that are D, most of them being df.
> >>>>>>> I also attached the cifs Stats output below.
> >>>>>>
> >>>>>> Ok thanks. What about Oops or BUG or panic in dmesg logs, did you see
> >>>>>> any?
> >>>>>
> >>>>> I did find the following messages in the dmesg of one of the servers:
> >>>>>
> >>>>> [    4.797893] FS-Cache: Duplicate cookie detected
> >>>>> [    4.797915] FS-Cache: O-cookie c=000000001a791554 [p=00000000fb6f31ee fl=222 nc=0 na=1]
> >>>>> [    4.797934] FS-Cache: O-cookie d=0000000020a06fab n=00000000654600e7
> >>>>> [    4.797949] FS-Cache: O-key=[8] '020001bd0a010102'
> >>>>> [    4.797963] FS-Cache: N-cookie c=000000005d0bf4a5 [p=00000000fb6f31ee fl=2 nc=0 na=1]
> >>>>> [    4.797982] FS-Cache: N-cookie d=0000000020a06fab n=000000004e1e47aa
> >>>>> [    4.797997] FS-Cache: N-key=[8] '020001bd0a010102'
> >>>>> [    4.798643] FS-Cache: Duplicate cookie detected
> >>>>> [    4.798659] FS-Cache: O-cookie c=000000001a791554 [p=00000000fb6f31ee fl=222 nc=0 na=1]
> >>>>> [    4.798679] FS-Cache: O-cookie d=0000000020a06fab n=00000000654600e7
> >>>>> [    4.798695] FS-Cache: O-key=[8] '020001bd0a010102'
> >>>>> [    4.798714] FS-Cache: N-cookie c=00000000cbe44971 [p=00000000fb6f31ee fl=2 nc=0 na=1]
> >>>>> [    4.798733] FS-Cache: N-cookie d=0000000020a06fab n=00000000ab0e78a6
> >>>>> [    4.798747] FS-Cache: N-key=[8] '020001bd0a010102'
> >>>>> [    4.906667] FS-Cache: Netfs 'nfs' registered for caching
> >>>>> [12738.729173] CIFS VFS: Send error in SessSetup = -126
> >>>>> [99125.480751] CIFS VFS: Send error in SessSetup = -126
> >>>>> [185517.295175] CIFS VFS: Send error in SessSetup = -126
> >>>>> [250515.749714] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
> >>>>> [250515.749740] BUG: unable to handle kernel paging request at ffff8ec52a6fe1d0
> >>>>> [250515.749757] PGD 1b2602067 P4D 1b2602067 PUD 42dbff063 PMD 42a357063 PTE 800000042a6fe063
> >>>>> [250515.749779] Oops: 0011 [#1] PREEMPT SMP PTI
> >>>>> [250515.749792] CPU: 1 PID: 796 Comm: cifsd Tainted: G            E     4.20.17-pd-4.20.y #20190611
> >>>>> [250515.749812] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/19/2018
> >>>>> [250515.749844] RIP: 0010:0xffff8ec52a6fe1d0
> >>>>> [250515.749860] Code: ff ff 00 00 00 00 fd 01 00 00 00 7d 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> e1 6f 2a c5 8e ff ff d0 e1 6f 2a c5 8e ff ff 80 70 82 2b c5 8e
> >>>>> [250515.749914] RSP: 0018:ffff9e60c2e2bdc0 EFLAGS: 00010202
> >>>>> [250515.749927] RAX: ffff8ec52a6fe1d0 RBX: ffff8ec526fce800 RCX: dead000000000200
> >>>>> [250515.749944] RDX: ffff8ec52d3e6800 RSI: 0000000000000246 RDI: ffff8ec52a6fe198
> >>>>> [250515.749962] RBP: ffff8ec52d3e6800 R08: 0000000000000002 R09: ffff8ec52a6fe170
> >>>>> [250515.749979] R10: ffff9e60c2e2bc10 R11: ffff8ec527088000 R12: ffff8ec526fce9c0
> >>>>> [250515.749997] R13: ffff8ec52d3e6800 R14: ffff9e60c2e2bdd8 R15: ffff8ec52d3e6b80
> >>>>> [250515.750014] FS:  0000000000000000(0000) GS:ffff8ec52fa80000(0000) knlGS:0000000000000000
> >>>>> [250515.750033] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>> [250515.750048] CR2: ffff8ec52a6fe1d0 CR3: 000000042a53c005 CR4: 00000000003606e0
> >>>>> [250515.750100] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>>> [250515.750102] CIFS VFS: No task to wake, unknown frame received! NumMids 3
> >>>>> [250515.750119] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>>>> [250515.750120] Call Trace:
> >>>>> [250515.750149] 00000000: 424d53fe 00010040 00000000 00000005  .SMB@...........
> >>>>> [250515.750193]  ? cifs_reconnect+0x337/0x880 [cifs]
> >>>>> [250515.750201]  ? cifs_handle_standard+0x169/0x190 [cifs]
> >>>>> [250515.750216] 00000010: 10000009 00000098 00000db2 00000000  ................
> >>>>> [250515.750234]  ? cifs_demultiplex_thread+0x9e5/0xbc0 [cifs]
> >>>>> [250515.750240] 00000020: 0000025c 00000001 640002e9 0000c81d  \..........d....
> >>>>> [250515.750259]  ? finish_task_switch+0x7d/0x290
> >>>>> [250515.750271] 00000030: 5a9e63f3 6204a9b0 587058e3 0d45419b  .c.Z...b.XpX.AE.
> >>>>> [250515.750295]  ? cifs_handle_standard+0x190/0x190 [cifs]
> >>>>> [250515.750728] CIFS VFS: No task to wake, unknown frame received! NumMids 3
> >>>>> [250515.751200]  ? kthread+0xf8/0x130
> >>>>> [250515.751639] 00000000: 424d53fe 00010040 00000000 00000010  .SMB@...........
> >>>>> [250515.752102]  ? kthread_create_worker_on_cpu+0x70/0x70
> >>>>> [250515.752554] 00000010: 0000000d 00000068 00000db3 00000000  ....h...........
> >>>>> [250515.752997]  ? ret_from_fork+0x35/0x40
> >>>>> [250515.753429] 00000020: 0000025c 00000001 640002e9 0000c81d  \..........d....
> >>>>> [250515.753874] Modules linked in: cpufreq_userspace(E) cpufreq_powersave(E) cpufreq_conservative(E) arc4(E) ecb(E) md4(E) nfsv3(E) nfs_acl(E) nfs(E) sha512_ssse3(E) sha512_generic(E) lockd(E) cmac(E) grace(E) hmac(E) nls_utf8(E) cifs(E) ccm(E) dns_resolver(E) fscache(E) vmw_vsock_vmci_transport(E) vsock(E) sb_edac(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) vmw_balloon(E) aes_x86_64(E) joydev(E) evdev(E) crypto_simd(E) vmwgfx(E) serio_raw(E) cryptd(E) glue_helper(E) ttm(E) sg(E) vmw_vmci(E) drm_kms_helper(E) drm(E) button(E) ac(E) auth_rpcgss(E) sunrpc(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) ata_generic(E) sd_mod(E) ata_piix(E) crc32c_intel(E) psmouse(E) libata(E) vmw_pvscsi(E) vmxnet3(E) scsi_mod(E) i2c_piix4(E)
> >>>>> [250515.754336] 00000030: 49d4fd21 858665a2 fde5288f 01d2d919  !..I.e...(......
> >>>>> [250515.754766] CR2: ffff8ec52a6fe1d0
> >>>>> [250515.758927] CIFS VFS: No task to wake, unknown frame received! NumMids 3
> >>>>> [250515.759389] ---[ end trace 92ea62cd080150de ]---
> >>>>> [250515.759879] 00000000: 424d53fe 00010040 00000000 00030006  .SMB@...........
> >>>>> [250515.760357] RIP: 0010:0xffff8ec52a6fe1d0
> >>>>> [250515.761841] Code: ff ff 00 00 00 00 fd 01 00 00 00 7d 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> e1 6f 2a c5 8e ff ff d0 e1 6f 2a c5 8e ff ff 80 70 82 2b c5 8e
> >>>>> [250515.763287] RSP: 0018:ffff9e60c2e2bdc0 EFLAGS: 00010202
> >>>>> [250515.763778] RAX: ffff8ec52a6fe1d0 RBX: ffff8ec526fce800 RCX: dead000000000200
> >>>>> [250515.764060] 00000010: 0000000d 00000000 00000db4 00000000  ................
> >>>>> [250515.764310] RDX: ffff8ec52d3e6800 RSI: 0000000000000246 RDI: ffff8ec52a6fe198
> >>>>> [250515.764311] RBP: ffff8ec52d3e6800 R08: 0000000000000002 R09: ffff8ec52a6fe170
> >>>>> [250515.764312] R10: ffff9e60c2e2bc10 R11: ffff8ec527088000 R12: ffff8ec526fce9c0
> >>>>> [250515.764314] R13: ffff8ec52d3e6800 R14: ffff9e60c2e2bdd8 R15: ffff8ec52d3e6b80
> >>>>> [250515.765491] 00000020: 0000025c 00000001 640002e9 0000c81d  \..........d....
> >>>>> [250515.765836] FS:  0000000000000000(0000) GS:ffff8ec52fa80000(0000) knlGS:0000000000000000
> >>>>> [250515.767899] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>> [250515.768367] CR2: ffff8ec52a6fe1d0 CR3: 000000042a53c005 CR4: 00000000003606e0
> >>>>> [250515.768919] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>>> [250515.769419] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>>>> [250515.769591] 00000030: 879543e1 572c0574 e53861ba 97ccc40b  .C..t.,W.a8.....
> >>>>> [271909.743603] CIFS VFS: Send error in SessSetup = -126
> >>>>> [358301.908585] CIFS VFS: Send error in SessSetup = -126
> >>>>> [444693.566453] CIFS VFS: Send error in SessSetup = -126
> >>>>> [531086.090040] CIFS VFS: Send error in SessSetup = -126
> >>>>> [617477.785390] CIFS VFS: Send error in SessSetup = -126
> >>>>> [703869.556705] CIFS VFS: Send error in SessSetup = -126
> >>>>> [790261.656853] CIFS VFS: Send error in SessSetup = -126
> >>>>> [876653.496928] CIFS VFS: Send error in SessSetup = -126
> >>>>> [963045.816742] CIFS VFS: Send error in SessSetup = -126
> >>>>> [1049437.566219] CIFS VFS: Send error in SessSetup = -126
> >>>>>
> >>>>>
> >>>>> And from another server:
> >>>>> [    4.253091] FS-Cache: Duplicate cookie detected
> >>>>> [    4.253120] FS-Cache: O-cookie c=000000004cc29d26 [p=0000000017dbbbc0 fl=222 nc=0 na=1]
> >>>>> [    4.253153] FS-Cache: O-cookie d=00000000e8e68765 n=0000000012869fa7
> >>>>> [    4.253179] FS-Cache: O-key=[8] '020001bd0a010102'
> >>>>> [    4.253201] FS-Cache: N-cookie c=00000000c3bbbddd [p=0000000017dbbbc0 fl=2 nc=0 na=1]
> >>>>> [    4.253235] FS-Cache: N-cookie d=00000000e8e68765 n=00000000335882b3
> >>>>> [    4.253262] FS-Cache: N-key=[8] '020001bd0a010102'
> >>>>> [    4.254107] FS-Cache: Duplicate cookie detected
> >>>>> [    4.254130] FS-Cache: O-cookie c=000000004cc29d26 [p=0000000017dbbbc0 fl=222 nc=0 na=1]
> >>>>> [    4.254161] FS-Cache: O-cookie d=00000000e8e68765 n=0000000012869fa7
> >>>>> [    4.254185] FS-Cache: O-key=[8] '020001bd0a010102'
> >>>>> [    4.254230] FS-Cache: N-cookie c=000000000ec2f0bb [p=0000000017dbbbc0 fl=2 nc=0 na=1]
> >>>>> [    4.254261] FS-Cache: N-cookie d=00000000e8e68765 n=0000000024706210
> >>>>> [    4.254285] FS-Cache: N-key=[8] '020001bd0a010102'
> >>>>> [    4.329147] CIFS VFS: BAD_NETWORK_NAME: \\stor02.bk.prodrive.nl\userdata$
> >>>>> [    4.330107] CIFS VFS: cifs_mount failed w/return code = -2
> >>>>> [65206.127542] CIFS VFS: Send error in SessSetup = -126
> >>>>> [151597.808064] CIFS VFS: Send error in SessSetup = -126
> >>>>> [237989.956447] CIFS VFS: Send error in SessSetup = -126
> >>>>> [324380.937984] CIFS VFS: Send error in SessSetup = -126
> >>>>> [402750.869518] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
> >>>>> [402750.869594] BUG: unable to handle kernel paging request at ffff909bf61609d0
> >>>>> [402750.869650] PGD 1ac02067 P4D 1ac02067 PUD 1385f5063 PMD 136142063 PTE 8000000136160063
> >>>>> [402750.869716] Oops: 0011 [#1] PREEMPT SMP PTI
> >>>>> [402750.869753] CPU: 0 PID: 797 Comm: cifsd Tainted: G            E     4.20.17-pd-4.20.y #20190611
> >>>>> [402750.869818] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/19/2018
> >>>>> [402750.869926] CIFS VFS: No task to wake, unknown frame received! NumMids 3
> >>>>> [402750.869947] RIP: 0010:0xffff909bf61609d0
> >>>>> [402750.870013] 00000000: 424d53fe 00010040 00000000 00000005  .SMB@...........
> >>>>> [402750.870198] Code: ff ff 00 00 00 00 4e 01 00 00 00 7d 00 00 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> 09 16 f6 9b 90 ff ff d0 09 16 f6 9b 90 ff ff c0 40 1f f2 9b 90
> >>>>> [402750.870275] 00000010: 10000009 00000098 000002b5 00000000  ................
> >>>>> [402750.870450] RSP: 0000:ffffa18d41adbd28 EFLAGS: 00010202
> >>>>> [402750.870469] 00000020: 000001cd 00000001 44000cd9 0000c824  ...........D$...
> >>>>> [402750.870509] RAX: ffff909bf61609d0 RBX: ffff909bf6c3f800 RCX: dead000000000200
> >>>>> [402750.870511] RDX: ffff909bf74fa980 RSI: 0000000000000246 RDI: ffff909bf6160998
> >>>>> [402750.870530] 00000030: 79955e4a 0dacb8eb 025edebd 4efa7788  J^.y......^..w.N
> >>>>> [402750.870582] RBP: ffff909bf74fa980 R08: 0000000000000000 R09: ffff909bf6c3f970
> >>>>> [402750.870585] R10: ffffa18d406a7cc0 R11: 0000000000000000 R12: ffff909bf6c3f9c0
> >>>>> [402750.870605] CIFS VFS: No task to wake, unknown frame received! NumMids 3
> >>>>> [402750.870656] R13: ffff909bf74fa980 R14: ffffa18d41adbd40 R15: ffff909bf74fa100
> >>>>> [402750.870660] FS:  0000000000000000(0000) GS:ffff909bf9c00000(0000) knlGS:0000000000000000
> >>>>> [402750.870677] 00000000: 424d53fe 00010040 00000000 00000010  .SMB@...........
> >>>>> [402750.870731] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>> [402750.870733] CR2: ffff909bf61609d0 CR3: 0000000136e56004 CR4: 00000000003606f0
> >>>>> [402750.870752] 00000010: 0000000d 00000068 000002b6 00000000  ....h...........
> >>>>> [402750.870888] 00000020: 000001cd 00000001 44000cd9 0000c824  ...........D$...
> >>>>> [402750.870894] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>>> [402750.870907] 00000030: f063942a cae9f6e1 d5327c26 91fc6f33  *.c.....&|2.3o..
> >>>>> [402750.872321] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>>>> [402750.872324] Call Trace:
> >>>>> [402750.872381]  ? cifs_reconnect+0x337/0x880 [cifs]
> >>>>> [402750.872410]  ? cifs_readv_from_socket+0x211/0x260 [cifs]
> >>>>> [402750.875687]  ? cifs_read_from_socket+0x4a/0x70 [cifs]
> >>>>> [402750.876128]  ? _raw_spin_unlock_irqrestore+0x20/0x40
> >>>>> [402750.876555]  ? try_to_wake_up+0x54/0x540
> >>>>> [402750.876991]  ? cifs_small_buf_get+0x16/0x20 [cifs]
> >>>>> [402750.877426]  ? cifs_demultiplex_thread+0xdd/0xbc0 [cifs]
> >>>>> [402750.877834]  ? finish_task_switch+0x7d/0x290
> >>>>> [402750.878252]  ? cifs_handle_standard+0x190/0x190 [cifs]
> >>>>> [402750.878651]  ? kthread+0xf8/0x130
> >>>>> [402750.879034]  ? kthread_create_worker_on_cpu+0x70/0x70
> >>>>> [402750.879412]  ? ret_from_fork+0x35/0x40
> >>>>> [402750.879786] Modules linked in: cpufreq_conservative(E) cpufreq_powersave(E) cpufreq_userspace(E) arc4(E) ecb(E) md4(E) sha512_ssse3(E) sha512_generic(E) cmac(E) hmac(E) nfsv3(E) nfs_acl(E) nls_utf8(E) nfs(E) lockd(E) grace(E) cifs(E) ccm(E) dns_resolver(E) fscache(E) sb_edac(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) aes_x86_64(E) crypto_simd(E) joydev(E) cryptd(E) vmw_balloon(E) evdev(E) glue_helper(E) serio_raw(E) vmwgfx(E) ttm(E) drm_kms_helper(E) sg(E) drm(E) vmw_vsock_vmci_transport(E) vsock(E) vmw_vmci(E) ac(E) button(E) auth_rpcgss(E) sunrpc(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sd_mod(E) ata_generic(E) ata_piix(E) vmw_pvscsi(E) libata(E) crc32c_intel(E) psmouse(E) vmxnet3(E) i2c_piix4(E) scsi_mod(E)
> >>>>> [402750.883384] CR2: ffff909bf61609d0
> >>>>> [402750.883783] ---[ end trace 08b06875e82513eb ]---
> >>>>> [402750.884200] RIP: 0010:0xffff909bf61609d0
> >>>>> [402750.884598] Code: ff ff 00 00 00 00 52 01 00 00 00 7d 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> 09 16 f6 9b 90 ff ff d0 09 16 f6 9b 90 ff ff c0 40 1f f2 9b 90
> >>>>> [402750.885822] RSP: 0000:ffffa18d41adbd28 EFLAGS: 00010202
> >>>>> [402750.886229] RAX: ffff909bf61609d0 RBX: ffff909bf6c3f800 RCX: dead000000000200
> >>>>> [402750.886651] RDX: ffff909bf74fa980 RSI: 0000000000000246 RDI: ffff909bf6160998
> >>>>> [402750.887065] RBP: ffff909bf74fa980 R08: 0000000000000000 R09: ffff909bf6c3f970
> >>>>> [402750.887478] R10: ffffa18d406a7cc0 R11: 0000000000000000 R12: ffff909bf6c3f9c0
> >>>>> [402750.887883] R13: ffff909bf74fa980 R14: ffffa18d41adbd40 R15: ffff909bf74fa100
> >>>>> [402750.888285] FS:  0000000000000000(0000) GS:ffff909bf9c00000(0000) knlGS:0000000000000000
> >>>>> [402750.888692] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>> [402750.889093] CR2: ffff909bf61609d0 CR3: 0000000136e56004 CR4: 00000000003606f0
> >>>>> [402750.889559] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>>> [402750.889972] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>>>> [410772.901264] CIFS VFS: Send error in SessSetup = -126
> >>>>> [497165.194662] CIFS VFS: Send error in SessSetup = -126
> >>>>> [583557.492304] CIFS VFS: Send error in SessSetup = -126
> >>>>> [669948.937016] CIFS VFS: Send error in SessSetup = -126
> >>>>> [756341.390112] CIFS VFS: Send error in SessSetup = -126
> >>>>> [842733.557002] CIFS VFS: Send error in SessSetup = -126
> >>>>> [929125.303428] CIFS VFS: Send error in SessSetup = -126
> >>>>> [1015516.629380] CIFS VFS: Send error in SessSetup = -126
> >>>>> [1101908.464697] CIFS VFS: Send error in SessSetup = -126
> >>>>> [1188300.531261] CIFS VFS: Send error in SessSetup = -126
> >>>>> [1274692.583049] CIFS VFS: Send error in SessSetup = -126
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> The individual stack dumps are pretty useful. Here is my theory:
> >>>>>>
> >>>>>>> pid: 9505
> >>>>>>> syscall: 4 0x56550a2ec470 0x7ffede42e9a0 0x7ffede42e9a0 0x83a 0x3 0x20
> >>>>>>> 0x7ffede42e8f8 0x7f7f8928f295
> >>>>>>> [<0>] open_shroot+0x43/0x200 [cifs]
> >>>>>>> [<0>] smb2_query_path_info+0x93/0x220 [cifs]
> >>>>>>
> >>>>>> Almost all of the processes have the same stack trace. They are stuck at
> >>>>>> open_shroot()+0x43 which is probably
> >>>>>>
> >>>>>>        mutex_lock(&tcon->crfid.fid_mutex);
> >>>>>>
> >>>>>> Then there are only 2 other processes stuck somewhere in the same code path
> >>>>>> (open_shroot) but deeper, meaning they have the locks that the other
> >>>>>> processes are waiting for:
> >>>>>>
> >>>>>>
> >>>>>>> pid: 22858
> >>>>>>> syscall: 4 0x564b46285d10 0x7ffcea3f9a80 0x7ffcea3f9a80 0x83a 0x3 0x20
> >>>>>>> 0x7ffcea3f99d8 0x7f6cc78c7295
> >>>>>>> [<0>] cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
> >>>>>>> [<0>] smb2_reconnect+0x2d6/0x4b0 [cifs]
> >>>>>>> [<0>] smb2_plain_req_init+0x30/0x240 [cifs]
> >>>>>>> [<0>] SMB2_open_init+0x6d/0x7c0 [cifs]
> >>>>>>> [<0>] SMB2_open+0x150/0x520 [cifs]
> >>>>>>> [<0>] open_shroot+0x12f/0x200 [cifs]
> >>>>>>> [<0>] smb2_query_path_info+0x93/0x220 [cifs]
> >>>>>>> [<0>] cifs_get_inode_info+0x580/0xb10 [cifs]
> >>>>>>> [<0>] cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
> >>>>>>> [<0>] cifs_getattr+0x5b/0x1b0 [cifs]
> >>>>>>> [<0>] vfs_statx+0x89/0xe0
> >>>>>>> [<0>] __do_sys_newstat+0x39/0x70
> >>>>>>> [<0>] do_syscall_64+0x55/0x100
> >>>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>>>>>> [<0>] 0xffffffffffffffff
> >>>>>>
> >>>>>>
> >>>>>>> pid: 20027
> >>>>>>> syscall: 4 0x55a3c7d767d0 0x7ffe51432ab0 0x7ffe51432ab0 0x83a
> >>>>>>> 0x55a3c7d75c40 0x20 0x7ffe51432a08 0x7f5f7c4e7295
> >>>>>>> [<0>] cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
> >>>>>>> [<0>] smb2_reconnect+0x2d6/0x4b0 [cifs]
> >>>>>>> [<0>] smb2_plain_req_init+0x30/0x240 [cifs]
> >>>>>>> [<0>] SMB2_open_init+0x6d/0x7c0 [cifs]
> >>>>>>> [<0>] SMB2_open+0x150/0x520 [cifs]
> >>>>>>> [<0>] open_shroot+0x12f/0x200 [cifs]
> >>>>>>> [<0>] smb2_query_path_info+0x93/0x220 [cifs]
> >>>>>>> [<0>] cifs_get_inode_info+0x580/0xb10 [cifs]
> >>>>>>> [<0>] cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
> >>>>>>> [<0>] cifs_getattr+0x5b/0x1b0 [cifs]
> >>>>>>> [<0>] vfs_statx+0x89/0xe0
> >>>>>>> [<0>] __do_sys_newstat+0x39/0x70
> >>>>>>> [<0>] do_syscall_64+0x55/0x100
> >>>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>>>>>> [<0>] 0xffffffffffffffff
> >>>>>>
> >>>>>> Due to timeouts maybe the Open request needs
> >>>>>> to reconnect the server/ses/tcon and to do this it calls
> >>>>>> cifs_mark_open_files_invalid() and gets stuck somewhere there.
> >>>>>>
> >>>>>>            spin_lock(&tcon->open_file_lock);
> >>>>>>            list_for_each_safe(tmp, tmp1, &tcon->openFileList) {
> >>>>>>                    open_file = list_entry(tmp, struct cifsFileInfo, tlist);
> >>>>>>                    open_file->invalidHandle = true;
> >>>>>>                    open_file->oplock_break_cancelled = true;
> >>>>>>            }
> >>>>>>            spin_unlock(&tcon->open_file_lock);
> >>>>>>
> >>>>>>            mutex_lock(&tcon->crfid.fid_mutex); <=== most likely here
> >>>>>>            tcon->crfid.is_valid = false;
> >>>>>>            memset(tcon->crfid.fid, 0, sizeof(struct cifs_fid));
> >>>>>>            mutex_unlock(&tcon->crfid.fid_mutex);
> >>>>>>
> >>>>>> I think these processes are trying to lock the same lock twice: one in
> >>>>>> open_shroot() and since Open ends up having to reconnect, once again in
> >>>>>> mark_open_files_invalid(). I think it's the same lock because I don't
> >>>>>> see why the tcon pointers would be different in those 2 spots. Kernel
> >>>>>> mutexes are not reentrant so this is a deadlock.
> >>>>>
> >>>>> Is there anything we can do about this? Is this maybe already fixed in newer kernels?
> >>>>>
> >>>>> Regards, Martijn
> >>>>>
> >>>>>>
> >>>>>> Cheers,
> >>>>>>
> >>>>>
> >>>>> --
> >>>>> Martijn de Gouw
> >>>>> Designer
> >>>>> Prodrive Technologies
> >>>>> Mobile: +31 63 17 76 161
> >>>>> Phone:  +31 40 26 76 200
> >>>
> >>> --
> >>> Martijn de Gouw
> >>> Designer
> >>> Prodrive Technologies
> >>> Mobile: +31 63 17 76 161
> >>> Phone:  +31 40 26 76 200
> >>
> >>
> >>
> >> --
> >> Thanks,
> >>
> >> Steve
>
> --
> Martijn de Gouw
> Designer
> Prodrive Technologies
> Mobile: +31 63 17 76 161
> Phone:  +31 40 26 76 200

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Many processes end up in uninterruptible sleep accessing cifs mounts
  2019-07-15 23:40                   ` Pavel Shilovsky
@ 2019-08-07  7:34                     ` Martijn de Gouw
  0 siblings, 0 replies; 16+ messages in thread
From: Martijn de Gouw @ 2019-08-07  7:34 UTC (permalink / raw)
  To: Pavel Shilovsky; +Cc: Steve French, Aurélien Aptel, linux-cifs

Hi,

We decided to move back to the longterm kernel provided by Debian:
4.19.0-0.bpo.5-amd64 #1 SMP Debian 4.19.37-3~bpo9+1 (2019-05-18) x86_64,
Combined with the nohandlecache mount option we don't have any process stuck in 'D' yet (uptime of 1 week so far!)

However, this kernel dump showed up in the logs:
[260512.681619] list_del corruption. prev->next should be ffff9e766fa9c380, but was 8400f931ae8cf165
[260512.681995] ------------[ cut here ]------------
[260512.681996] kernel BUG at /build/linux-F9rbp9/linux-4.19.37/lib/list_debug.c:53!
[260512.682194] invalid opcode: 0000 [#1] SMP PTI
[260512.682382] CPU: 39 PID: 1303 Comm: cifsd Not tainted 4.19.0-0.bpo.5-amd64 #1 Debian 4.19.37-3~bpo9+1
[260512.682579] Hardware name: Cisco Systems Inc UCSC-C220-M4S/UCSC-C220-M4S, BIOS C220M4.3.0.4a.0.0226182253 02/26/2018
[260512.682828] RIP: 0010:__list_del_entry_valid+0x70/0x90
[260512.683049] Code: a3 08 a2 e8 60 ab d0 ff 0f 0b 48 89 fe 48 c7 c7 b0 a3 08 a2 e8 4f ab d0 ff 0f 0b 48 89 fe 48 c7 c7 e8 a3 08                                                 a2 e8 3e ab d0 ff <0f> 0b 48 89 fe 48 c7 c7 28 a4 08 a2 e8 2d ab d0 ff 0f 0b 90 90 90
[260512.683795] RSP: 0018:ffffb1880df93dc0 EFLAGS: 00010282
[260512.684196] RAX: 0000000000000054 RBX: ffff9e8a78477800 RCX: 0000000000000000
[260512.684489] RDX: 0000000000000000 RSI: ffff9e8a7fe566b8 RDI: ffff9e8a7fe566b8
[260512.684791] RBP: ffff9e8a784779c0 R08: 0000000000000000 R09: 0000000000000911
[260512.685101] R10: ffffb1880df93c60 R11: ffff9e8a6be72610 R12: ffff9e766fa9c380
[260512.685422] R13: ffff9e8a78477998 R14: ffffb1880df93dd8 R15: 8400f931ae8cf165
[260512.685750] FS:  0000000000000000(0000) GS:ffff9e8a7fe40000(0000) knlGS:0000000000000000
[260512.686094] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[260512.686462] CR2: 00007f18114cef80 CR3: 0000003a93e0a005 CR4: 00000000003606e0
[260512.687011] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[260512.687504] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[260512.687875] Call Trace:
[260512.688282]  cifs_reconnect+0x2fc/0x880 [cifs]
[260512.688686]  cifs_handle_standard+0x169/0x190 [cifs]
[260512.689092]  cifs_demultiplex_thread+0xa2a/0xbc0 [cifs]
[260512.689515]  ? __switch_to_asm+0x40/0x70
[260512.689939]  kthread+0xf8/0x130
[260512.690378]  ? cifs_handle_standard+0x190/0x190 [cifs]
[260512.690814]  ? kthread_create_worker_on_cpu+0x70/0x70
[260512.691257]  ret_from_fork+0x35/0x40
[260512.691704] Modules linked in: cpufreq_powersave cpufreq_conservative cpufreq_userspace arc4 md4 sha512_ssse3 sha512_generic c                                                mac nls_utf8 cifs nfsv3 nfs ccm dns_resolver fscache intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel k                                                vm irqbypass acpi_power_meter crct10dif_pclmul mgag200 ipmi_ssif acpi_ipmi ttm crc32_pclmul drm_kms_helper ghash_clmulni_intel drm                                                 iTCO_wdt joydev iTCO_vendor_support intel_cstate evdev intel_uncore mei_me sg intel_rapl_perf mei lpc_ich mxm_wmi pcc_cpufreq ipm                                                i_si ipmi_devintf wmi ipmi_msghandler acpi_pad ac button binfmt_misc nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables x_table                                                s autofs4 hid_generic usbhid hid ext4 crc16 mbcache jbd2 crc32c_generic fscrypto ecb sd_mod ahci libahci crc32c_intel fnic libata                                                 libfcoe libfc megaraid_sas aesni_intel
[260512.695513]  scsi_transport_fc aes_x86_64 crypto_simd ehci_pci cryptd ehci_hcd glue_helper scsi_mod igb usbcore enic i2c_algo_                                                bit usb_common dca
[260512.696809] ---[ end trace 469eb33f60a57cab ]---
[260512.722046] RIP: 0010:__list_del_entry_valid+0x70/0x90
[260512.722707] Code: a3 08 a2 e8 60 ab d0 ff 0f 0b 48 89 fe 48 c7 c7 b0 a3 08 a2 e8 4f ab d0 ff 0f 0b 48 89 fe 48 c7 c7 e8 a3 08                                                 a2 e8 3e ab d0 ff <0f> 0b 48 89 fe 48 c7 c7 28 a4 08 a2 e8 2d ab d0 ff 0f 0b 90 90 90
[260512.724106] RSP: 0018:ffffb1880df93dc0 EFLAGS: 00010282
[260512.724831] RAX: 0000000000000054 RBX: ffff9e8a78477800 RCX: 0000000000000000
[260512.725625] RDX: 0000000000000000 RSI: ffff9e8a7fe566b8 RDI: ffff9e8a7fe566b8
[260512.726364] RBP: ffff9e8a784779c0 R08: 0000000000000000 R09: 0000000000000911
[260512.727112] R10: ffffb1880df93c60 R11: ffff9e8a6be72610 R12: ffff9e766fa9c380
[260512.727895] R13: ffff9e8a78477998 R14: ffffb1880df93dd8 R15: 8400f931ae8cf165
[260512.728666] FS:  0000000000000000(0000) GS:ffff9e8a7fe40000(0000) knlGS:0000000000000000
[260512.729432] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[260512.730187] CR2: 00007f18114cef80 CR3: 0000003a93e0a005 CR4: 00000000003606e0
[260512.730942] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[260512.731689] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

I looked into the patched that are applied to cifs on the 4.19 branch, but, in my assumption, I did 
not find any related to this particular issue.

Regards, Martijn


On 16-07-2019 01:40, Pavel Shilovsky wrote:
> Thanks for letting us know.
> 
> This looks like a memory problem - similar to the one reported in
> "General protection faults in cifs_reconnect..." thread. It doesn't
> seem like encryption is used here, so the problem is somewhere else.
> 
>  From the stack trace the kernel is 4.20.17, so it has many stable
> patches merged from the crediting series. Could you try older kernels?
> Like 4.20.2, 4.20.5 and 4.20.7 (all 3h have different set of fixes
> applied).
> 
> --
> Best regards,
> Pavel Shilovsky
> 
> вт, 9 июл. 2019 г. в 01:48, Martijn de Gouw
> <martijn.de.gouw@prodrive-technologies.com>:
>>
>> Hi,
>>
>> Here is another kernel log, but it looks like a different issue.
>> All cifs mounts have the nohandlecache option set.
>> This server is running the kernel without the V2 patch:
>>
>> [706049.499153] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
>> [706049.499183] BUG: unable to handle kernel paging request at ffff8e6381d549d0
>> [706049.499222] PGD 175c02067 P4D 175c02067 PUD 262470063 PMD 8000000101c000e3
>> [706049.499245] Oops: 0011 [#1] PREEMPT SMP PTI
>> [706049.499258] CPU: 0 PID: 77140 Comm: cifsd Tainted: G            E     4.20.17-pd-4.20.y #20190611
>> [706049.499280] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/19/2018
>> [706049.499320] RIP: 0010:0xffff8e6381d549d0
>> [706049.499332] Code: ff ff 00 00 00 00 e2 04 00 00 00 7d 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> 49 d5 81 63 8e ff ff d0 49 d5 81 63 8e ff ff 00 91 36 88 63 8e
>> [706049.499374] RSP: 0018:ffff9d2bd0c8bdc0 EFLAGS: 00010202
>> [706049.499388] RAX: ffff8e6381d549d0 RBX: ffff8e659b599000 RCX: dead000000000200
>> [706049.499405] RDX: ffff8e66a9fe4280 RSI: 0000000000000246 RDI: ffff8e6381d54998
>> [706049.499422] RBP: ffff8e66a9fe4280 R08: 0000000000000000 R09: ffff8e6381d54970
>> [706049.499439] R10: ffff9d2bd0c8bc10 R11: ffff8e66a346c000 R12: ffff8e659b5991c0
>> [706049.499456] R13: ffff8e66a9fe4280 R14: ffff9d2bd0c8bdd8 R15: ffff8e66a9fe4900
>> [706049.499473] FS:  0000000000000000(0000) GS:ffff8e66adc00000(0000) knlGS:0000000000000000
>> [706049.499495] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [706049.499516] CR2: ffff8e6381d549d0 CR3: 0000000427928002 CR4: 00000000003606f0
>> [706049.499589] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [706049.499607] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [706049.499625] Call Trace:
>> [706049.499934] CIFS VFS: No task to wake, unknown frame received! NumMids 3
>> [706049.500076]  ? cifs_reconnect+0x337/0x880 [cifs]
>> [706049.500764] 00000000: 424d53fe 00010040 00000000 00000005  .SMB@...........
>> [706049.501157]  ? cifs_handle_standard+0x169/0x190 [cifs]
>> [706049.501166]  ? cifs_demultiplex_thread+0x9e5/0xbc0 [cifs]
>> [706049.501867] 00000010: 00000001 00000098 0000046b 00000000  ........k.......
>> [706049.502259]  ? finish_task_switch+0x7d/0x290
>> [706049.502980] 00000020: 000002c3 00000001 00000905 00007400  .............t..
>> [706049.503352]  ? cifs_handle_standard+0x190/0x190 [cifs]
>> [706049.504057] 00000030: 00000000 00000000 00000000 00000000  ................
>> [706049.504475]  ? kthread+0xf8/0x130
>> [706049.504478]  ? kthread_create_worker_on_cpu+0x70/0x70
>> [706049.505247] CIFS VFS: No task to wake, unknown frame received! NumMids 3
>> [706049.505681]  ? ret_from_fork+0x35/0x40
>> [706049.506498] 00000000: 424d53fe 00010040 00000000 00000010  .SMB@...........
>> [706049.506926] Modules linked in: dm_snapshot(E) dm_bufio(E) cpufreq_userspace(E) cpufreq_conservative(E) cpufreq_powersave(E) arc4(E) ecb(E) nfsv3(E) md4(E) nfs_acl(E) sha512_ssse3(E) nfs(E) sha512_generic(E) cmac(E) hmac(E) lockd(E) nls_utf8(E) cifs(E) grace(E) ccm(E) dns_resolver(E) fscache(E) vmw_vsock_vmci_transport(E) vsock(E) sb_edac(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) vmwgfx(E) ttm(E) aesni_intel(E) drm_kms_helper(E) aes_x86_64(E) crypto_simd(E) sg(E) cryptd(E) joydev(E) evdev(E) vmw_balloon(E) drm(E) glue_helper(E) vmw_vmci(E) serio_raw(E) ac(E) button(E) auth_rpcgss(E) sunrpc(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) dm_mod(E) sr_mod(E) cdrom(E) ata_generic(E) sd_mod(E) ata_piix(E) libata(E) vmw_pvscsi(E) crc32c_intel(E) psmouse(E) e1000e(E) scsi_mod(E) i2c_piix4(E)
>> [706049.507729] 00000010: 00000005 00000068 0000046c 00000000  ....h...l.......
>> [706049.508166] CR2: ffff8e6381d549d0
>> [706049.508965] 00000020: 000002c3 00000001 00000905 00007400  .............t..
>> [706049.513140] ---[ end trace e371dfb71e7921db ]---
>> [706049.513143] RIP: 0010:0xffff8e6381d549d0
>> [706049.513144] Code: ff ff 00 00 00 00 e2 04 00 00 00 7d 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> 49 d5 81 63 8e ff ff d0 49 d5 81 63 8e ff ff 00 91 36 88 63 8e
>> [706049.513145] RSP: 0018:ffff9d2bd0c8bdc0 EFLAGS: 00010202
>> [706049.513147] RAX: ffff8e6381d549d0 RBX: ffff8e659b599000 RCX: dead000000000200
>> [706049.513147] RDX: ffff8e66a9fe4280 RSI: 0000000000000246 RDI: ffff8e6381d54998
>> [706049.513148] RBP: ffff8e66a9fe4280 R08: 0000000000000000 R09: ffff8e6381d54970
>> [706049.513148] R10: ffff9d2bd0c8bc10 R11: ffff8e66a346c000 R12: ffff8e659b5991c0
>> [706049.513149] R13: ffff8e66a9fe4280 R14: ffff9d2bd0c8bdd8 R15: ffff8e66a9fe4900
>> [706049.513150] FS:  0000000000000000(0000) GS:ffff8e66adc00000(0000) knlGS:0000000000000000
>> [706049.513151] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [706049.513151] CR2: ffff8e6381d549d0 CR3: 0000000427928002 CR4: 00000000003606f0
>> [706049.513201] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [706049.521671] 00000030: 00000000 00000000 00000000 00000000  ................
>> [706049.522058] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [706049.523846] CIFS VFS: No task to wake, unknown frame received! NumMids 3
>> [706049.524699] 00000000: 424d53fe 00010040 00000000 00060006  .SMB@...........
>> [706049.525574] 00000010: 00000005 00000000 0000046d 00000000  ........m.......
>> [706049.526449] 00000020: 000002c3 00000001 00000905 00007400  .............t..
>> [706049.527304] 00000030: 00000000 00000000 00000000 00000000  ................
>> [707493.825696] CIFS VFS: Server stor02.bk.prodrive.nl has not responded in 120 seconds. Reconnecting...
>> [707615.670866] CIFS VFS: Server stor02.bk.prodrive.nl has not responded in 120 seconds. Reconnecting...
>> [707737.516154] CIFS VFS: Server stor02.bk.prodrive.nl has not responded in 120 seconds. Reconnecting...
>>
>> Regards, Martijn
>>
>> On 06-07-2019 18:40, Pavel Shilovsky wrote:
>>> Martijn, thanks for the traces. This is a deadlock as mentioned by Aurelien:
>>>
>>> [36247.368103] INFO: task puppet:3836 blocked for more than 120 seconds.
>>> [36247.368119]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
>>> [36247.368135] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> disables this message.
>>> [36247.368158] puppet          D    0  3836    541 0x00000000
>>> [36247.368159] Call Trace:
>>> [36247.368161]  ? __schedule+0x338/0x850
>>> [36247.368163]  schedule+0x3c/0x90
>>> [36247.368164]  schedule_preempt_disabled+0x14/0x20
>>> [36247.368165]  __mutex_lock.isra.7+0x19f/0x540
>>> [36247.368166]  ? __switch_to_asm+0x34/0x70
>>> [36247.368167]  ? __switch_to_asm+0x40/0x70
>>> [36247.368169]  ? try_module_get+0x68/0x100
>>> [36247.368178]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
>>> [36247.368186]  smb2_reconnect+0x1d7/0x4b0 [cifs]
>>> [36247.368188]  ? _raw_spin_unlock_irq+0x1d/0x30
>>> [36247.368189]  ? finish_task_switch+0x7d/0x290
>>> [36247.368190]  ? __switch_to_asm+0x40/0x70
>>> [36247.368198]  smb2_plain_req_init+0x30/0x240 [cifs]
>>> [36247.368199]  ? _raw_spin_lock_irqsave+0x25/0x50
>>> [36247.368208]  SMB2_open_init+0x6d/0x7c0 [cifs]
>>> [36247.368209]  ? _raw_spin_lock+0x13/0x30
>>> [36247.368218]  ? cifs_mid_q_entry_release+0x15/0x40 [cifs]
>>> [36247.368227]  ? SMB2_open+0x150/0x520 [cifs]
>>> [36247.368235]  SMB2_open+0x150/0x520 [cifs]
>>> [36247.368238]  ? sched_clock+0x5/0x10
>>> [36247.368247]  ? open_shroot+0x12f/0x200 [cifs]
>>> [36247.368255]  open_shroot+0x12f/0x200 [cifs]
>>> [36247.368258]  ? __follow_mount_rcu.isra.31+0x3c/0xf0
>>> [36247.368267]  smb2_query_path_info+0x93/0x220 [cifs]
>>> [36247.368269]  ? walk_component+0x48/0x360
>>> [36247.368278]  cifs_get_inode_info+0x580/0xb10 [cifs]
>>> [36247.368286]  ? filename_lookup+0xf8/0x1a0
>>> [36247.368294]  ? build_path_from_dentry_optional_prefix+0x1e9/0x440 [cifs]
>>> [36247.368303]  cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
>>> [36247.368311]  cifs_getattr+0x5b/0x1b0 [cifs]
>>> [36247.368313]  vfs_statx+0x89/0xe0
>>> [36247.368315]  __do_sys_newlstat+0x39/0x70
>>> [36247.368318]  do_syscall_64+0x55/0x100
>>> [36247.368320]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [36247.368321] RIP: 0033:0x7f562d3a0335
>>> [36247.368323] Code: Bad RIP value.
>>>
>>> This looks urgent to fix - we have a parallel thread with the similar
>>> problem observed on 5.1 kernel. In order to fix it we need to drop
>>> open shroot mutex before going into SMB2_open_init and other PDU
>>> functions.
>>>
>>> You can try to work around this by specifying "nohandlecache" mount option.
>>>
>>> --
>>> Best regards,
>>> Pavel Shilovsky
>>>
>>> сб, 6 июл. 2019 г. в 06:05, Steve French <smfrench@gmail.com>:
>>>>
>>>> Are we sure this isn't something really simple like the server IP
>>>> address changing?  Remember in 5.0 kernel (also updated cifs-utils)
>>>> the code to allow failover when server's IP address changes was added.
>>>>
>>>> On Sat, Jul 6, 2019 at 6:12 AM Martijn de Gouw
>>>> <martijn.de.gouw@prodrive-technologies.com> wrote:
>>>>>
>>>>> Hi Pavel,
>>>>>
>>>>> On 04-07-2019 23:34, Pavel Shilovsky wrote:
>>>>>> Hi Martijn,
>>>>>>
>>>>>> Thanks for reporting the problem. Have you tried v5.1.5 kernel and
>>>>>> above? That one has many reconnect fixes. If yes, could you please
>>>>>> provide the kernel stack / panic traces when running new versions of
>>>>>> the kernel?
>>>>>
>>>>> I can try to run them for a while on a less critical server. But even if that machine runs fine, there is not really a guarantee the cifs issue we have right now is solved for our critical machines, causing me to be very hesitated to upgrade those.
>>>>>
>>>>> Btw, here is another kernel dump, generated today. Please note that on this machine the cifs mounts are not really used, all 'action'
>>>>> performed on these mounts is the check if they are still mounted (by puppet).
>>>>>
>>>>>
>>>>> [36247.367572] INFO: task node_exporter:3567 blocked for more than 120 seconds.
>>>>> [36247.367613]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
>>>>> [36247.367630] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>> [36247.367648] node_exporter   D    0  3567      1 0x00000000
>>>>> [36247.367650] Call Trace:
>>>>> [36247.367668]  ? __schedule+0x338/0x850
>>>>> [36247.367671]  ? preempt_count_add+0x67/0xb0
>>>>> [36247.367673]  schedule+0x3c/0x90
>>>>> [36247.367675]  schedule_preempt_disabled+0x14/0x20
>>>>> [36247.367676]  __mutex_lock.isra.7+0x19f/0x540
>>>>> [36247.367680]  ? try_module_get+0x68/0x100
>>>>> [36247.367792]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
>>>>> [36247.367803]  smb2_reconnect+0x1d7/0x4b0 [cifs]
>>>>> [36247.367824]  smb2_plain_req_init+0x30/0x240 [cifs]
>>>>> [36247.367826]  ? __switch_to_asm+0x34/0x70
>>>>> [36247.367827]  ? __switch_to_asm+0x40/0x70
>>>>> [36247.367836]  SMB2_open_init+0x6d/0x7c0 [cifs]
>>>>> [36247.367846]  ? smb2_queryfs+0x13d/0x350 [cifs]
>>>>> [36247.367848]  ? lookup_fast+0xc8/0x2e0
>>>>> [36247.367857]  smb2_queryfs+0x13d/0x350 [cifs]
>>>>> [36247.367861]  ? lookup_fast+0xc8/0x2e0
>>>>> [36247.367863]  ? ___cache_free+0x31/0x2e0
>>>>> [36247.367865]  ? terminate_walk+0x93/0x100
>>>>> [36247.367873]  ? cifs_statfs+0xad/0x300 [cifs]
>>>>> [36247.367880]  cifs_statfs+0xad/0x300 [cifs]
>>>>> [36247.367894]  statfs_by_dentry+0x6a/0x90
>>>>> [36247.367895]  vfs_statfs+0x16/0xc0
>>>>> [36247.367897]  user_statfs+0x50/0xa0
>>>>> [36247.367898]  __do_sys_statfs+0x20/0x50
>>>>> [36247.367902]  do_syscall_64+0x55/0x100
>>>>> [36247.367903]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>> [36247.367905] RIP: 0033:0x4a5c20
>>>>> [36247.367936] Code: Bad RIP value.
>>>>> [36247.367937] RSP: 002b:000000c0003ef490 EFLAGS: 00000206 ORIG_RAX: 0000000000000089
>>>>> [36247.367938] RAX: ffffffffffffffda RBX: 000000c00002a000 RCX: 00000000004a5c20
>>>>> [36247.367939] RDX: 0000000000000000 RSI: 000000c0003ef5c0 RDI: 000000c0004510e0
>>>>> [36247.367940] RBP: 000000c0003ef4f0 R08: 0000000000000000 R09: 0000000000000000
>>>>> [36247.367940] R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff
>>>>> [36247.367941] R13: 0000000000000088 R14: 0000000000000087 R15: 0000000000000100
>>>>> [36247.367954] INFO: task kworker/1:2:32265 blocked for more than 120 seconds.
>>>>> [36247.367973]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
>>>>> [36247.367989] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>> [36247.368008] kworker/1:2     D    0 32265      2 0x80000000
>>>>> [36247.368020] Workqueue: cifsiod smb2_reconnect_server [cifs]
>>>>> [36247.368021] Call Trace:
>>>>> [36247.368023]  ? __schedule+0x338/0x850
>>>>> [36247.368025]  schedule+0x3c/0x90
>>>>> [36247.368026]  schedule_preempt_disabled+0x14/0x20
>>>>> [36247.368028]  __mutex_lock.isra.7+0x19f/0x540
>>>>> [36247.368037]  ? SMB2_sess_auth_rawntlmssp_authenticate+0x69/0x170 [cifs]
>>>>> [36247.368046]  ? SMB2_sess_setup+0x17f/0x2c0 [cifs]
>>>>> [36247.368054]  ? cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
>>>>> [36247.368055]  ? _raw_spin_unlock+0x16/0x30
>>>>> [36247.368062]  cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
>>>>> [36247.368071]  smb2_reconnect+0x2d6/0x4b0 [cifs]
>>>>> [36247.368073]  ? _raw_spin_unlock+0x16/0x30
>>>>> [36247.368074]  ? preempt_count_add+0x67/0xb0
>>>>> [36247.368083]  smb2_reconnect_server+0x1bb/0x350 [cifs]
>>>>> [36247.368090]  process_one_work+0x188/0x3a0
>>>>> [36247.368092]  worker_thread+0x4c/0x3a0
>>>>> [36247.368094]  ? preempt_count_add+0x67/0xb0
>>>>> [36247.368095]  ? process_one_work+0x3a0/0x3a0
>>>>> [36247.368097]  kthread+0xf8/0x130
>>>>> [36247.368099]  ? kthread_create_worker_on_cpu+0x70/0x70
>>>>> [36247.368101]  ret_from_fork+0x35/0x40
>>>>> [36247.368103] INFO: task puppet:3836 blocked for more than 120 seconds.
>>>>> [36247.368119]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
>>>>> [36247.368135] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>> [36247.368158] puppet          D    0  3836    541 0x00000000
>>>>> [36247.368159] Call Trace:
>>>>> [36247.368161]  ? __schedule+0x338/0x850
>>>>> [36247.368163]  schedule+0x3c/0x90
>>>>> [36247.368164]  schedule_preempt_disabled+0x14/0x20
>>>>> [36247.368165]  __mutex_lock.isra.7+0x19f/0x540
>>>>> [36247.368166]  ? __switch_to_asm+0x34/0x70
>>>>> [36247.368167]  ? __switch_to_asm+0x40/0x70
>>>>> [36247.368169]  ? try_module_get+0x68/0x100
>>>>> [36247.368178]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
>>>>> [36247.368186]  smb2_reconnect+0x1d7/0x4b0 [cifs]
>>>>> [36247.368188]  ? _raw_spin_unlock_irq+0x1d/0x30
>>>>> [36247.368189]  ? finish_task_switch+0x7d/0x290
>>>>> [36247.368190]  ? __switch_to_asm+0x40/0x70
>>>>> [36247.368198]  smb2_plain_req_init+0x30/0x240 [cifs]
>>>>> [36247.368199]  ? _raw_spin_lock_irqsave+0x25/0x50
>>>>> [36247.368208]  SMB2_open_init+0x6d/0x7c0 [cifs]
>>>>> [36247.368209]  ? _raw_spin_lock+0x13/0x30
>>>>> [36247.368218]  ? cifs_mid_q_entry_release+0x15/0x40 [cifs]
>>>>> [36247.368227]  ? SMB2_open+0x150/0x520 [cifs]
>>>>> [36247.368235]  SMB2_open+0x150/0x520 [cifs]
>>>>> [36247.368238]  ? sched_clock+0x5/0x10
>>>>> [36247.368247]  ? open_shroot+0x12f/0x200 [cifs]
>>>>> [36247.368255]  open_shroot+0x12f/0x200 [cifs]
>>>>> [36247.368258]  ? __follow_mount_rcu.isra.31+0x3c/0xf0
>>>>> [36247.368267]  smb2_query_path_info+0x93/0x220 [cifs]
>>>>> [36247.368269]  ? walk_component+0x48/0x360
>>>>> [36247.368278]  cifs_get_inode_info+0x580/0xb10 [cifs]
>>>>> [36247.368286]  ? filename_lookup+0xf8/0x1a0
>>>>> [36247.368294]  ? build_path_from_dentry_optional_prefix+0x1e9/0x440 [cifs]
>>>>> [36247.368303]  cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
>>>>> [36247.368311]  cifs_getattr+0x5b/0x1b0 [cifs]
>>>>> [36247.368313]  vfs_statx+0x89/0xe0
>>>>> [36247.368315]  __do_sys_newlstat+0x39/0x70
>>>>> [36247.368318]  do_syscall_64+0x55/0x100
>>>>> [36247.368320]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>> [36247.368321] RIP: 0033:0x7f562d3a0335
>>>>> [36247.368323] Code: Bad RIP value.
>>>>> [36247.368324] RSP: 002b:00007ffc6bbc4398 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
>>>>> [36247.368325] RAX: ffffffffffffffda RBX: 00007ffc6bbc43b0 RCX: 00007f562d3a0335
>>>>> [36247.368326] RDX: 00007ffc6bbc43b0 RSI: 00007ffc6bbc43b0 RDI: 000055e4e5e1bd50
>>>>> [36247.368326] RBP: 000055e4e2eb65e0 R08: 0000000000000002 R09: 000055e4e2eb69e0
>>>>> [36247.368327] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
>>>>> [36247.368328] R13: 000055e4e2ef7258 R14: 000055e4e2ef8810 R15: 000055e4e2f92ad0
>>>>> [36368.186922] INFO: task node_exporter:3567 blocked for more than 120 seconds.
>>>>> [36368.186988]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
>>>>> [36368.187039] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>> [36368.187097] node_exporter   D    0  3567      1 0x00000000
>>>>> [36368.187102] Call Trace:
>>>>> [36368.187117]  ? __schedule+0x338/0x850
>>>>> [36368.187123]  ? preempt_count_add+0x67/0xb0
>>>>> [36368.187128]  schedule+0x3c/0x90
>>>>> [36368.187133]  schedule_preempt_disabled+0x14/0x20
>>>>> [36368.187138]  __mutex_lock.isra.7+0x19f/0x540
>>>>> [36368.187175]  ? try_module_get+0x68/0x100
>>>>> [36368.187232]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
>>>>> [36368.187265]  smb2_reconnect+0x1d7/0x4b0 [cifs]
>>>>> [36368.187300]  smb2_plain_req_init+0x30/0x240 [cifs]
>>>>> [36368.187304]  ? __switch_to_asm+0x34/0x70
>>>>> [36368.187307]  ? __switch_to_asm+0x40/0x70
>>>>> [36368.187337]  SMB2_open_init+0x6d/0x7c0 [cifs]
>>>>> [36368.187372]  ? smb2_queryfs+0x13d/0x350 [cifs]
>>>>> [36368.187375]  ? lookup_fast+0xc8/0x2e0
>>>>> [36368.187405]  smb2_queryfs+0x13d/0x350 [cifs]
>>>>> [36368.187410]  ? lookup_fast+0xc8/0x2e0
>>>>> [36368.187416]  ? ___cache_free+0x31/0x2e0
>>>>> [36368.187421]  ? terminate_walk+0x93/0x100
>>>>> [36368.187449]  ? cifs_statfs+0xad/0x300 [cifs]
>>>>> [36368.187471]  cifs_statfs+0xad/0x300 [cifs]
>>>>> [36368.187477]  statfs_by_dentry+0x6a/0x90
>>>>> [36368.187482]  vfs_statfs+0x16/0xc0
>>>>> [36368.187486]  user_statfs+0x50/0xa0
>>>>> [36368.187491]  __do_sys_statfs+0x20/0x50
>>>>> [36368.187499]  do_syscall_64+0x55/0x100
>>>>> [36368.187503]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>> [36368.187507] RIP: 0033:0x4a5c20
>>>>> [36368.187517] Code: Bad RIP value.
>>>>> [36368.187519] RSP: 002b:000000c0003ef490 EFLAGS: 00000206 ORIG_RAX: 0000000000000089
>>>>> [36368.187523] RAX: ffffffffffffffda RBX: 000000c00002a000 RCX: 00000000004a5c20
>>>>> [36368.187526] RDX: 0000000000000000 RSI: 000000c0003ef5c0 RDI: 000000c0004510e0
>>>>> [36368.187528] RBP: 000000c0003ef4f0 R08: 0000000000000000 R09: 0000000000000000
>>>>> [36368.187530] R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff
>>>>> [36368.187532] R13: 0000000000000088 R14: 0000000000000087 R15: 0000000000000100
>>>>> [36368.187572] INFO: task kworker/1:2:32265 blocked for more than 120 seconds.
>>>>> [36368.187627]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
>>>>> [36368.187677] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>> [36368.187735] kworker/1:2     D    0 32265      2 0x80000000
>>>>> [36368.187771] Workqueue: cifsiod smb2_reconnect_server [cifs]
>>>>> [36368.187773] Call Trace:
>>>>> [36368.187780]  ? __schedule+0x338/0x850
>>>>> [36368.187785]  schedule+0x3c/0x90
>>>>> [36368.187790]  schedule_preempt_disabled+0x14/0x20
>>>>> [36368.187795]  __mutex_lock.isra.7+0x19f/0x540
>>>>> [36368.187826]  ? SMB2_sess_auth_rawntlmssp_authenticate+0x69/0x170 [cifs]
>>>>> [36368.187855]  ? SMB2_sess_setup+0x17f/0x2c0 [cifs]
>>>>> [36368.187880]  ? cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
>>>>> [36368.187885]  ? _raw_spin_unlock+0x16/0x30
>>>>> [36368.187908]  cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
>>>>> [36368.187936]  smb2_reconnect+0x2d6/0x4b0 [cifs]
>>>>> [36368.187942]  ? _raw_spin_unlock+0x16/0x30
>>>>> [36368.187947]  ? preempt_count_add+0x67/0xb0
>>>>> [36368.187976]  smb2_reconnect_server+0x1bb/0x350 [cifs]
>>>>> [36368.187984]  process_one_work+0x188/0x3a0
>>>>> [36368.187990]  worker_thread+0x4c/0x3a0
>>>>> [36368.187994]  ? preempt_count_add+0x67/0xb0
>>>>> [36368.187998]  ? process_one_work+0x3a0/0x3a0
>>>>> [36368.188001]  kthread+0xf8/0x130
>>>>> [36368.188005]  ? kthread_create_worker_on_cpu+0x70/0x70
>>>>> [36368.188009]  ret_from_fork+0x35/0x40
>>>>> [36368.188014] INFO: task puppet:3836 blocked for more than 120 seconds.
>>>>> [36368.188064]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
>>>>> [36368.188114] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>> [36368.188182] puppet          D    0  3836    541 0x00000000
>>>>> [36368.188186] Call Trace:
>>>>> [36368.188191]  ? __schedule+0x338/0x850
>>>>> [36368.188196]  schedule+0x3c/0x90
>>>>> [36368.188200]  schedule_preempt_disabled+0x14/0x20
>>>>> [36368.188205]  __mutex_lock.isra.7+0x19f/0x540
>>>>> [36368.188208]  ? __switch_to_asm+0x34/0x70
>>>>> [36368.188211]  ? __switch_to_asm+0x40/0x70
>>>>> [36368.188215]  ? try_module_get+0x68/0x100
>>>>> [36368.188244]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
>>>>> [36368.188271]  smb2_reconnect+0x1d7/0x4b0 [cifs]
>>>>> [36368.188277]  ? _raw_spin_unlock_irq+0x1d/0x30
>>>>> [36368.188280]  ? finish_task_switch+0x7d/0x290
>>>>> [36368.188283]  ? __switch_to_asm+0x40/0x70
>>>>> [36368.188310]  smb2_plain_req_init+0x30/0x240 [cifs]
>>>>> [36368.188314]  ? _raw_spin_lock_irqsave+0x25/0x50
>>>>> [36368.188342]  SMB2_open_init+0x6d/0x7c0 [cifs]
>>>>> [36368.188346]  ? _raw_spin_lock+0x13/0x30
>>>>> [36368.188376]  ? cifs_mid_q_entry_release+0x15/0x40 [cifs]
>>>>> [36368.188405]  ? SMB2_open+0x150/0x520 [cifs]
>>>>> [36368.188432]  SMB2_open+0x150/0x520 [cifs]
>>>>> [36368.188440]  ? sched_clock+0x5/0x10
>>>>> [36368.188470]  ? open_shroot+0x12f/0x200 [cifs]
>>>>> [36368.188497]  open_shroot+0x12f/0x200 [cifs]
>>>>> [36368.188503]  ? __follow_mount_rcu.isra.31+0x3c/0xf0
>>>>> [36368.188534]  smb2_query_path_info+0x93/0x220 [cifs]
>>>>> [36368.188539]  ? walk_component+0x48/0x360
>>>>> [36368.188569]  cifs_get_inode_info+0x580/0xb10 [cifs]
>>>>> [36368.188574]  ? filename_lookup+0xf8/0x1a0
>>>>> [36368.188601]  ? build_path_from_dentry_optional_prefix+0x1e9/0x440 [cifs]
>>>>> [36368.188630]  cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
>>>>> [36368.188659]  cifs_getattr+0x5b/0x1b0 [cifs]
>>>>> [36368.188664]  vfs_statx+0x89/0xe0
>>>>> [36368.188670]  __do_sys_newlstat+0x39/0x70
>>>>> [36368.188677]  do_syscall_64+0x55/0x100
>>>>> [36368.188681]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>> [36368.188684] RIP: 0033:0x7f562d3a0335
>>>>> [36368.188689] Code: Bad RIP value.
>>>>> [36368.188691] RSP: 002b:00007ffc6bbc4398 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
>>>>> [36368.188695] RAX: ffffffffffffffda RBX: 00007ffc6bbc43b0 RCX: 00007f562d3a0335
>>>>> [36368.188696] RDX: 00007ffc6bbc43b0 RSI: 00007ffc6bbc43b0 RDI: 000055e4e5e1bd50
>>>>> [36368.188698] RBP: 000055e4e2eb65e0 R08: 0000000000000002 R09: 000055e4e2eb69e0
>>>>> [36368.188700] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
>>>>> [36368.188702] R13: 000055e4e2ef7258 R14: 000055e4e2ef8810 R15: 000055e4e2f92ad0
>>>>> [36489.006262] INFO: task node_exporter:3567 blocked for more than 120 seconds.
>>>>> [36489.006300]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
>>>>> [36489.006329] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>> [36489.006458] node_exporter   D    0  3567      1 0x00000000
>>>>> [36489.006461] Call Trace:
>>>>> [36489.006470]  ? __schedule+0x338/0x850
>>>>> [36489.006474]  ? preempt_count_add+0x67/0xb0
>>>>> [36489.006476]  schedule+0x3c/0x90
>>>>> [36489.006478]  schedule_preempt_disabled+0x14/0x20
>>>>> [36489.006480]  __mutex_lock.isra.7+0x19f/0x540
>>>>> [36489.006485]  ? try_module_get+0x68/0x100
>>>>> [36489.006515]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
>>>>> [36489.006531]  smb2_reconnect+0x1d7/0x4b0 [cifs]
>>>>> [36489.006547]  smb2_plain_req_init+0x30/0x240 [cifs]
>>>>> [36489.006549]  ? __switch_to_asm+0x34/0x70
>>>>> [36489.006550]  ? __switch_to_asm+0x40/0x70
>>>>> [36489.006565]  SMB2_open_init+0x6d/0x7c0 [cifs]
>>>>> [36489.006581]  ? smb2_queryfs+0x13d/0x350 [cifs]
>>>>> [36489.006583]  ? lookup_fast+0xc8/0x2e0
>>>>> [36489.006597]  smb2_queryfs+0x13d/0x350 [cifs]
>>>>> [36489.006599]  ? lookup_fast+0xc8/0x2e0
>>>>> [36489.006602]  ? ___cache_free+0x31/0x2e0
>>>>> [36489.006604]  ? terminate_walk+0x93/0x100
>>>>> [36489.006619]  ? cifs_statfs+0xad/0x300 [cifs]
>>>>> [36489.006632]  cifs_statfs+0xad/0x300 [cifs]
>>>>> [36489.006635]  statfs_by_dentry+0x6a/0x90
>>>>> [36489.006637]  vfs_statfs+0x16/0xc0
>>>>> [36489.006639]  user_statfs+0x50/0xa0
>>>>> [36489.006641]  __do_sys_statfs+0x20/0x50
>>>>> [36489.006645]  do_syscall_64+0x55/0x100
>>>>> [36489.006647]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>> [36489.006649] RIP: 0033:0x4a5c20
>>>>> [36489.006654] Code: Bad RIP value.
>>>>> [36489.006655] RSP: 002b:000000c0003ef490 EFLAGS: 00000206 ORIG_RAX: 0000000000000089
>>>>> [36489.006657] RAX: ffffffffffffffda RBX: 000000c00002a000 RCX: 00000000004a5c20
>>>>> [36489.006658] RDX: 0000000000000000 RSI: 000000c0003ef5c0 RDI: 000000c0004510e0
>>>>> [36489.006659] RBP: 000000c0003ef4f0 R08: 0000000000000000 R09: 0000000000000000
>>>>> [36489.006660] R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff
>>>>> [36489.006661] R13: 0000000000000088 R14: 0000000000000087 R15: 0000000000000100
>>>>> [36489.006675] INFO: task kworker/1:2:32265 blocked for more than 120 seconds.
>>>>> [36489.006812]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
>>>>> [36489.006940] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>> [36489.007076] kworker/1:2     D    0 32265      2 0x80000000
>>>>> [36489.007100] Workqueue: cifsiod smb2_reconnect_server [cifs]
>>>>> [36489.007101] Call Trace:
>>>>> [36489.007105]  ? __schedule+0x338/0x850
>>>>> [36489.007107]  schedule+0x3c/0x90
>>>>> [36489.007109]  schedule_preempt_disabled+0x14/0x20
>>>>> [36489.007111]  __mutex_lock.isra.7+0x19f/0x540
>>>>> [36489.007124]  ? SMB2_sess_auth_rawntlmssp_authenticate+0x69/0x170 [cifs]
>>>>> [36489.007134]  ? SMB2_sess_setup+0x17f/0x2c0 [cifs]
>>>>> [36489.007141]  ? cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
>>>>> [36489.007143]  ? _raw_spin_unlock+0x16/0x30
>>>>> [36489.007150]  cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
>>>>> [36489.007159]  smb2_reconnect+0x2d6/0x4b0 [cifs]
>>>>> [36489.007161]  ? _raw_spin_unlock+0x16/0x30
>>>>> [36489.007163]  ? preempt_count_add+0x67/0xb0
>>>>> [36489.007172]  smb2_reconnect_server+0x1bb/0x350 [cifs]
>>>>> [36489.007176]  process_one_work+0x188/0x3a0
>>>>> [36489.007178]  worker_thread+0x4c/0x3a0
>>>>> [36489.007179]  ? preempt_count_add+0x67/0xb0
>>>>> [36489.007180]  ? process_one_work+0x3a0/0x3a0
>>>>> [36489.007181]  kthread+0xf8/0x130
>>>>> [36489.007183]  ? kthread_create_worker_on_cpu+0x70/0x70
>>>>> [36489.007184]  ret_from_fork+0x35/0x40
>>>>> [36489.007186] INFO: task puppet:3836 blocked for more than 120 seconds.
>>>>> [36489.007294]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
>>>>> [36489.007409] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>> [36489.007540] puppet          D    0  3836    541 0x00000000
>>>>> [36489.007541] Call Trace:
>>>>> [36489.007545]  ? __schedule+0x338/0x850
>>>>> [36489.007548]  schedule+0x3c/0x90
>>>>> [36489.007550]  schedule_preempt_disabled+0x14/0x20
>>>>> [36489.007552]  __mutex_lock.isra.7+0x19f/0x540
>>>>> [36489.007553]  ? __switch_to_asm+0x34/0x70
>>>>> [36489.007555]  ? __switch_to_asm+0x40/0x70
>>>>> [36489.007558]  ? try_module_get+0x68/0x100
>>>>> [36489.007580]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
>>>>> [36489.007594]  smb2_reconnect+0x1d7/0x4b0 [cifs]
>>>>> [36489.007596]  ? _raw_spin_unlock_irq+0x1d/0x30
>>>>> [36489.007598]  ? finish_task_switch+0x7d/0x290
>>>>> [36489.007600]  ? __switch_to_asm+0x40/0x70
>>>>> [36489.007613]  smb2_plain_req_init+0x30/0x240 [cifs]
>>>>> [36489.007615]  ? _raw_spin_lock_irqsave+0x25/0x50
>>>>> [36489.007628]  SMB2_open_init+0x6d/0x7c0 [cifs]
>>>>> [36489.007630]  ? _raw_spin_lock+0x13/0x30
>>>>> [36489.007644]  ? cifs_mid_q_entry_release+0x15/0x40 [cifs]
>>>>> [36489.007658]  ? SMB2_open+0x150/0x520 [cifs]
>>>>> [36489.007672]  SMB2_open+0x150/0x520 [cifs]
>>>>> [36489.007677]  ? sched_clock+0x5/0x10
>>>>> [36489.007692]  ? open_shroot+0x12f/0x200 [cifs]
>>>>> [36489.007707]  open_shroot+0x12f/0x200 [cifs]
>>>>> [36489.007711]  ? __follow_mount_rcu.isra.31+0x3c/0xf0
>>>>> [36489.007727]  smb2_query_path_info+0x93/0x220 [cifs]
>>>>> [36489.007729]  ? walk_component+0x48/0x360
>>>>> [36489.007745]  cifs_get_inode_info+0x580/0xb10 [cifs]
>>>>> [36489.007748]  ? filename_lookup+0xf8/0x1a0
>>>>> [36489.007762]  ? build_path_from_dentry_optional_prefix+0x1e9/0x440 [cifs]
>>>>> [36489.007778]  cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
>>>>> [36489.007791]  cifs_getattr+0x5b/0x1b0 [cifs]
>>>>> [36489.007794]  vfs_statx+0x89/0xe0
>>>>> [36489.007797]  __do_sys_newlstat+0x39/0x70
>>>>> [36489.007801]  do_syscall_64+0x55/0x100
>>>>> [36489.007802]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>> [36489.007805] RIP: 0033:0x7f562d3a0335
>>>>> [36489.007808] Code: Bad RIP value.
>>>>> [36489.007809] RSP: 002b:00007ffc6bbc4398 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
>>>>> [36489.007811] RAX: ffffffffffffffda RBX: 00007ffc6bbc43b0 RCX: 00007f562d3a0335
>>>>> [36489.007812] RDX: 00007ffc6bbc43b0 RSI: 00007ffc6bbc43b0 RDI: 000055e4e5e1bd50
>>>>> [36489.007813] RBP: 000055e4e2eb65e0 R08: 0000000000000002 R09: 000055e4e2eb69e0
>>>>> [36489.007814] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
>>>>> [36489.007815] R13: 000055e4e2ef7258 R14: 000055e4e2ef8810 R15: 000055e4e2f92ad0
>>>>> [36609.825705] INFO: task node_exporter:3567 blocked for more than 120 seconds.
>>>>> [36609.825836]       Tainted: G            E     4.20.17-pd-4.20.y #20190611
>>>>> [36609.825869] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>> [36609.825908] node_exporter   D    0  3567      1 0x00000000
>>>>> [36609.825910] Call Trace:
>>>>> [36609.825919]  ? __schedule+0x338/0x850
>>>>> [36609.825922]  ? preempt_count_add+0x67/0xb0
>>>>> [36609.825925]  schedule+0x3c/0x90
>>>>> [36609.825927]  schedule_preempt_disabled+0x14/0x20
>>>>> [36609.825929]  __mutex_lock.isra.7+0x19f/0x540
>>>>> [36609.825933]  ? try_module_get+0x68/0x100
>>>>> [36609.825960]  ? smb2_reconnect+0x1d7/0x4b0 [cifs]
>>>>> [36609.825979]  smb2_reconnect+0x1d7/0x4b0 [cifs]
>>>>> [36609.825999]  smb2_plain_req_init+0x30/0x240 [cifs]
>>>>> [36609.826001]  ? __switch_to_asm+0x34/0x70
>>>>> [36609.826002]  ? __switch_to_asm+0x40/0x70
>>>>> [36609.826020]  SMB2_open_init+0x6d/0x7c0 [cifs]
>>>>> [36609.826039]  ? smb2_queryfs+0x13d/0x350 [cifs]
>>>>> [36609.826040]  ? lookup_fast+0xc8/0x2e0
>>>>> [36609.826057]  smb2_queryfs+0x13d/0x350 [cifs]
>>>>> [36609.826060]  ? lookup_fast+0xc8/0x2e0
>>>>> [36609.826064]  ? ___cache_free+0x31/0x2e0
>>>>> [36609.826066]  ? terminate_walk+0x93/0x100
>>>>> [36609.826084]  ? cifs_statfs+0xad/0x300 [cifs]
>>>>> [36609.826099]  cifs_statfs+0xad/0x300 [cifs]
>>>>> [36609.826103]  statfs_by_dentry+0x6a/0x90
>>>>> [36609.826104]  vfs_statfs+0x16/0xc0
>>>>> [36609.826106]  user_statfs+0x50/0xa0
>>>>> [36609.826108]  __do_sys_statfs+0x20/0x50
>>>>> [36609.826113]  do_syscall_64+0x55/0x100
>>>>> [36609.826115]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>> [36609.826117] RIP: 0033:0x4a5c20
>>>>> [36609.826123] Code: Bad RIP value.
>>>>> [36609.826124] RSP: 002b:000000c0003ef490 EFLAGS: 00000206 ORIG_RAX: 0000000000000089
>>>>> [36609.826126] RAX: ffffffffffffffda RBX: 000000c00002a000 RCX: 00000000004a5c20
>>>>> [36609.826127] RDX: 0000000000000000 RSI: 000000c0003ef5c0 RDI: 000000c0004510e0
>>>>> [36609.826128] RBP: 000000c0003ef4f0 R08: 0000000000000000 R09: 0000000000000000
>>>>> [36609.826129] R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff
>>>>> [36609.826130] R13: 0000000000000088 R14: 0000000000000087 R15: 0000000000000100
>>>>> [73117.654990] CIFS VFS: Send error in SessSetup = -126
>>>>>
>>>>> Regards, Martijn
>>>>>
>>>>>>
>>>>>> --
>>>>>> Best regards,
>>>>>> Pavel Shilovsky
>>>>>>
>>>>>> чт, 4 июл. 2019 г. в 06:36, Martijn de Gouw
>>>>>> <martijn.de.gouw@prodrive-technologies.com>:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> On 04-07-2019 13:22, Aurélien Aptel wrote:
>>>>>>>> Martijn de Gouw <martijn.de.gouw@prodrive-technologies.com> writes:
>>>>>>>>>> Are there any kernel oops/panic with stack traces and register dumps in
>>>>>>>>>> the log?
>>>>>>>>>>
>>>>>>>>>> You can inspect the kernel stack trace of the hung processes (to see where
>>>>>>>>>> they are stuck) by printing the file /proc/<pid>/stack.
>>>>>>>>>
>>>>>>>>> These are the stacks of all processes that are D, most of them being df.
>>>>>>>>> I also attached the cifs Stats output below.
>>>>>>>>
>>>>>>>> Ok thanks. What about Oops or BUG or panic in dmesg logs, did you see
>>>>>>>> any?
>>>>>>>
>>>>>>> I did find the following messages in the dmesg of one of the servers:
>>>>>>>
>>>>>>> [    4.797893] FS-Cache: Duplicate cookie detected
>>>>>>> [    4.797915] FS-Cache: O-cookie c=000000001a791554 [p=00000000fb6f31ee fl=222 nc=0 na=1]
>>>>>>> [    4.797934] FS-Cache: O-cookie d=0000000020a06fab n=00000000654600e7
>>>>>>> [    4.797949] FS-Cache: O-key=[8] '020001bd0a010102'
>>>>>>> [    4.797963] FS-Cache: N-cookie c=000000005d0bf4a5 [p=00000000fb6f31ee fl=2 nc=0 na=1]
>>>>>>> [    4.797982] FS-Cache: N-cookie d=0000000020a06fab n=000000004e1e47aa
>>>>>>> [    4.797997] FS-Cache: N-key=[8] '020001bd0a010102'
>>>>>>> [    4.798643] FS-Cache: Duplicate cookie detected
>>>>>>> [    4.798659] FS-Cache: O-cookie c=000000001a791554 [p=00000000fb6f31ee fl=222 nc=0 na=1]
>>>>>>> [    4.798679] FS-Cache: O-cookie d=0000000020a06fab n=00000000654600e7
>>>>>>> [    4.798695] FS-Cache: O-key=[8] '020001bd0a010102'
>>>>>>> [    4.798714] FS-Cache: N-cookie c=00000000cbe44971 [p=00000000fb6f31ee fl=2 nc=0 na=1]
>>>>>>> [    4.798733] FS-Cache: N-cookie d=0000000020a06fab n=00000000ab0e78a6
>>>>>>> [    4.798747] FS-Cache: N-key=[8] '020001bd0a010102'
>>>>>>> [    4.906667] FS-Cache: Netfs 'nfs' registered for caching
>>>>>>> [12738.729173] CIFS VFS: Send error in SessSetup = -126
>>>>>>> [99125.480751] CIFS VFS: Send error in SessSetup = -126
>>>>>>> [185517.295175] CIFS VFS: Send error in SessSetup = -126
>>>>>>> [250515.749714] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
>>>>>>> [250515.749740] BUG: unable to handle kernel paging request at ffff8ec52a6fe1d0
>>>>>>> [250515.749757] PGD 1b2602067 P4D 1b2602067 PUD 42dbff063 PMD 42a357063 PTE 800000042a6fe063
>>>>>>> [250515.749779] Oops: 0011 [#1] PREEMPT SMP PTI
>>>>>>> [250515.749792] CPU: 1 PID: 796 Comm: cifsd Tainted: G            E     4.20.17-pd-4.20.y #20190611
>>>>>>> [250515.749812] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/19/2018
>>>>>>> [250515.749844] RIP: 0010:0xffff8ec52a6fe1d0
>>>>>>> [250515.749860] Code: ff ff 00 00 00 00 fd 01 00 00 00 7d 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> e1 6f 2a c5 8e ff ff d0 e1 6f 2a c5 8e ff ff 80 70 82 2b c5 8e
>>>>>>> [250515.749914] RSP: 0018:ffff9e60c2e2bdc0 EFLAGS: 00010202
>>>>>>> [250515.749927] RAX: ffff8ec52a6fe1d0 RBX: ffff8ec526fce800 RCX: dead000000000200
>>>>>>> [250515.749944] RDX: ffff8ec52d3e6800 RSI: 0000000000000246 RDI: ffff8ec52a6fe198
>>>>>>> [250515.749962] RBP: ffff8ec52d3e6800 R08: 0000000000000002 R09: ffff8ec52a6fe170
>>>>>>> [250515.749979] R10: ffff9e60c2e2bc10 R11: ffff8ec527088000 R12: ffff8ec526fce9c0
>>>>>>> [250515.749997] R13: ffff8ec52d3e6800 R14: ffff9e60c2e2bdd8 R15: ffff8ec52d3e6b80
>>>>>>> [250515.750014] FS:  0000000000000000(0000) GS:ffff8ec52fa80000(0000) knlGS:0000000000000000
>>>>>>> [250515.750033] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>> [250515.750048] CR2: ffff8ec52a6fe1d0 CR3: 000000042a53c005 CR4: 00000000003606e0
>>>>>>> [250515.750100] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>> [250515.750102] CIFS VFS: No task to wake, unknown frame received! NumMids 3
>>>>>>> [250515.750119] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>> [250515.750120] Call Trace:
>>>>>>> [250515.750149] 00000000: 424d53fe 00010040 00000000 00000005  .SMB@...........
>>>>>>> [250515.750193]  ? cifs_reconnect+0x337/0x880 [cifs]
>>>>>>> [250515.750201]  ? cifs_handle_standard+0x169/0x190 [cifs]
>>>>>>> [250515.750216] 00000010: 10000009 00000098 00000db2 00000000  ................
>>>>>>> [250515.750234]  ? cifs_demultiplex_thread+0x9e5/0xbc0 [cifs]
>>>>>>> [250515.750240] 00000020: 0000025c 00000001 640002e9 0000c81d  \..........d....
>>>>>>> [250515.750259]  ? finish_task_switch+0x7d/0x290
>>>>>>> [250515.750271] 00000030: 5a9e63f3 6204a9b0 587058e3 0d45419b  .c.Z...b.XpX.AE.
>>>>>>> [250515.750295]  ? cifs_handle_standard+0x190/0x190 [cifs]
>>>>>>> [250515.750728] CIFS VFS: No task to wake, unknown frame received! NumMids 3
>>>>>>> [250515.751200]  ? kthread+0xf8/0x130
>>>>>>> [250515.751639] 00000000: 424d53fe 00010040 00000000 00000010  .SMB@...........
>>>>>>> [250515.752102]  ? kthread_create_worker_on_cpu+0x70/0x70
>>>>>>> [250515.752554] 00000010: 0000000d 00000068 00000db3 00000000  ....h...........
>>>>>>> [250515.752997]  ? ret_from_fork+0x35/0x40
>>>>>>> [250515.753429] 00000020: 0000025c 00000001 640002e9 0000c81d  \..........d....
>>>>>>> [250515.753874] Modules linked in: cpufreq_userspace(E) cpufreq_powersave(E) cpufreq_conservative(E) arc4(E) ecb(E) md4(E) nfsv3(E) nfs_acl(E) nfs(E) sha512_ssse3(E) sha512_generic(E) lockd(E) cmac(E) grace(E) hmac(E) nls_utf8(E) cifs(E) ccm(E) dns_resolver(E) fscache(E) vmw_vsock_vmci_transport(E) vsock(E) sb_edac(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) vmw_balloon(E) aes_x86_64(E) joydev(E) evdev(E) crypto_simd(E) vmwgfx(E) serio_raw(E) cryptd(E) glue_helper(E) ttm(E) sg(E) vmw_vmci(E) drm_kms_helper(E) drm(E) button(E) ac(E) auth_rpcgss(E) sunrpc(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) ata_generic(E) sd_mod(E) ata_piix(E) crc32c_intel(E) psmouse(E) libata(E) vmw_pvscsi(E) vmxnet3(E) scsi_mod(E) i2c_piix4(E)
>>>>>>> [250515.754336] 00000030: 49d4fd21 858665a2 fde5288f 01d2d919  !..I.e...(......
>>>>>>> [250515.754766] CR2: ffff8ec52a6fe1d0
>>>>>>> [250515.758927] CIFS VFS: No task to wake, unknown frame received! NumMids 3
>>>>>>> [250515.759389] ---[ end trace 92ea62cd080150de ]---
>>>>>>> [250515.759879] 00000000: 424d53fe 00010040 00000000 00030006  .SMB@...........
>>>>>>> [250515.760357] RIP: 0010:0xffff8ec52a6fe1d0
>>>>>>> [250515.761841] Code: ff ff 00 00 00 00 fd 01 00 00 00 7d 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> e1 6f 2a c5 8e ff ff d0 e1 6f 2a c5 8e ff ff 80 70 82 2b c5 8e
>>>>>>> [250515.763287] RSP: 0018:ffff9e60c2e2bdc0 EFLAGS: 00010202
>>>>>>> [250515.763778] RAX: ffff8ec52a6fe1d0 RBX: ffff8ec526fce800 RCX: dead000000000200
>>>>>>> [250515.764060] 00000010: 0000000d 00000000 00000db4 00000000  ................
>>>>>>> [250515.764310] RDX: ffff8ec52d3e6800 RSI: 0000000000000246 RDI: ffff8ec52a6fe198
>>>>>>> [250515.764311] RBP: ffff8ec52d3e6800 R08: 0000000000000002 R09: ffff8ec52a6fe170
>>>>>>> [250515.764312] R10: ffff9e60c2e2bc10 R11: ffff8ec527088000 R12: ffff8ec526fce9c0
>>>>>>> [250515.764314] R13: ffff8ec52d3e6800 R14: ffff9e60c2e2bdd8 R15: ffff8ec52d3e6b80
>>>>>>> [250515.765491] 00000020: 0000025c 00000001 640002e9 0000c81d  \..........d....
>>>>>>> [250515.765836] FS:  0000000000000000(0000) GS:ffff8ec52fa80000(0000) knlGS:0000000000000000
>>>>>>> [250515.767899] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>> [250515.768367] CR2: ffff8ec52a6fe1d0 CR3: 000000042a53c005 CR4: 00000000003606e0
>>>>>>> [250515.768919] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>> [250515.769419] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>> [250515.769591] 00000030: 879543e1 572c0574 e53861ba 97ccc40b  .C..t.,W.a8.....
>>>>>>> [271909.743603] CIFS VFS: Send error in SessSetup = -126
>>>>>>> [358301.908585] CIFS VFS: Send error in SessSetup = -126
>>>>>>> [444693.566453] CIFS VFS: Send error in SessSetup = -126
>>>>>>> [531086.090040] CIFS VFS: Send error in SessSetup = -126
>>>>>>> [617477.785390] CIFS VFS: Send error in SessSetup = -126
>>>>>>> [703869.556705] CIFS VFS: Send error in SessSetup = -126
>>>>>>> [790261.656853] CIFS VFS: Send error in SessSetup = -126
>>>>>>> [876653.496928] CIFS VFS: Send error in SessSetup = -126
>>>>>>> [963045.816742] CIFS VFS: Send error in SessSetup = -126
>>>>>>> [1049437.566219] CIFS VFS: Send error in SessSetup = -126
>>>>>>>
>>>>>>>
>>>>>>> And from another server:
>>>>>>> [    4.253091] FS-Cache: Duplicate cookie detected
>>>>>>> [    4.253120] FS-Cache: O-cookie c=000000004cc29d26 [p=0000000017dbbbc0 fl=222 nc=0 na=1]
>>>>>>> [    4.253153] FS-Cache: O-cookie d=00000000e8e68765 n=0000000012869fa7
>>>>>>> [    4.253179] FS-Cache: O-key=[8] '020001bd0a010102'
>>>>>>> [    4.253201] FS-Cache: N-cookie c=00000000c3bbbddd [p=0000000017dbbbc0 fl=2 nc=0 na=1]
>>>>>>> [    4.253235] FS-Cache: N-cookie d=00000000e8e68765 n=00000000335882b3
>>>>>>> [    4.253262] FS-Cache: N-key=[8] '020001bd0a010102'
>>>>>>> [    4.254107] FS-Cache: Duplicate cookie detected
>>>>>>> [    4.254130] FS-Cache: O-cookie c=000000004cc29d26 [p=0000000017dbbbc0 fl=222 nc=0 na=1]
>>>>>>> [    4.254161] FS-Cache: O-cookie d=00000000e8e68765 n=0000000012869fa7
>>>>>>> [    4.254185] FS-Cache: O-key=[8] '020001bd0a010102'
>>>>>>> [    4.254230] FS-Cache: N-cookie c=000000000ec2f0bb [p=0000000017dbbbc0 fl=2 nc=0 na=1]
>>>>>>> [    4.254261] FS-Cache: N-cookie d=00000000e8e68765 n=0000000024706210
>>>>>>> [    4.254285] FS-Cache: N-key=[8] '020001bd0a010102'
>>>>>>> [    4.329147] CIFS VFS: BAD_NETWORK_NAME: \\stor02.bk.prodrive.nl\userdata$
>>>>>>> [    4.330107] CIFS VFS: cifs_mount failed w/return code = -2
>>>>>>> [65206.127542] CIFS VFS: Send error in SessSetup = -126
>>>>>>> [151597.808064] CIFS VFS: Send error in SessSetup = -126
>>>>>>> [237989.956447] CIFS VFS: Send error in SessSetup = -126
>>>>>>> [324380.937984] CIFS VFS: Send error in SessSetup = -126
>>>>>>> [402750.869518] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
>>>>>>> [402750.869594] BUG: unable to handle kernel paging request at ffff909bf61609d0
>>>>>>> [402750.869650] PGD 1ac02067 P4D 1ac02067 PUD 1385f5063 PMD 136142063 PTE 8000000136160063
>>>>>>> [402750.869716] Oops: 0011 [#1] PREEMPT SMP PTI
>>>>>>> [402750.869753] CPU: 0 PID: 797 Comm: cifsd Tainted: G            E     4.20.17-pd-4.20.y #20190611
>>>>>>> [402750.869818] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/19/2018
>>>>>>> [402750.869926] CIFS VFS: No task to wake, unknown frame received! NumMids 3
>>>>>>> [402750.869947] RIP: 0010:0xffff909bf61609d0
>>>>>>> [402750.870013] 00000000: 424d53fe 00010040 00000000 00000005  .SMB@...........
>>>>>>> [402750.870198] Code: ff ff 00 00 00 00 4e 01 00 00 00 7d 00 00 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> 09 16 f6 9b 90 ff ff d0 09 16 f6 9b 90 ff ff c0 40 1f f2 9b 90
>>>>>>> [402750.870275] 00000010: 10000009 00000098 000002b5 00000000  ................
>>>>>>> [402750.870450] RSP: 0000:ffffa18d41adbd28 EFLAGS: 00010202
>>>>>>> [402750.870469] 00000020: 000001cd 00000001 44000cd9 0000c824  ...........D$...
>>>>>>> [402750.870509] RAX: ffff909bf61609d0 RBX: ffff909bf6c3f800 RCX: dead000000000200
>>>>>>> [402750.870511] RDX: ffff909bf74fa980 RSI: 0000000000000246 RDI: ffff909bf6160998
>>>>>>> [402750.870530] 00000030: 79955e4a 0dacb8eb 025edebd 4efa7788  J^.y......^..w.N
>>>>>>> [402750.870582] RBP: ffff909bf74fa980 R08: 0000000000000000 R09: ffff909bf6c3f970
>>>>>>> [402750.870585] R10: ffffa18d406a7cc0 R11: 0000000000000000 R12: ffff909bf6c3f9c0
>>>>>>> [402750.870605] CIFS VFS: No task to wake, unknown frame received! NumMids 3
>>>>>>> [402750.870656] R13: ffff909bf74fa980 R14: ffffa18d41adbd40 R15: ffff909bf74fa100
>>>>>>> [402750.870660] FS:  0000000000000000(0000) GS:ffff909bf9c00000(0000) knlGS:0000000000000000
>>>>>>> [402750.870677] 00000000: 424d53fe 00010040 00000000 00000010  .SMB@...........
>>>>>>> [402750.870731] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>> [402750.870733] CR2: ffff909bf61609d0 CR3: 0000000136e56004 CR4: 00000000003606f0
>>>>>>> [402750.870752] 00000010: 0000000d 00000068 000002b6 00000000  ....h...........
>>>>>>> [402750.870888] 00000020: 000001cd 00000001 44000cd9 0000c824  ...........D$...
>>>>>>> [402750.870894] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>> [402750.870907] 00000030: f063942a cae9f6e1 d5327c26 91fc6f33  *.c.....&|2.3o..
>>>>>>> [402750.872321] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>> [402750.872324] Call Trace:
>>>>>>> [402750.872381]  ? cifs_reconnect+0x337/0x880 [cifs]
>>>>>>> [402750.872410]  ? cifs_readv_from_socket+0x211/0x260 [cifs]
>>>>>>> [402750.875687]  ? cifs_read_from_socket+0x4a/0x70 [cifs]
>>>>>>> [402750.876128]  ? _raw_spin_unlock_irqrestore+0x20/0x40
>>>>>>> [402750.876555]  ? try_to_wake_up+0x54/0x540
>>>>>>> [402750.876991]  ? cifs_small_buf_get+0x16/0x20 [cifs]
>>>>>>> [402750.877426]  ? cifs_demultiplex_thread+0xdd/0xbc0 [cifs]
>>>>>>> [402750.877834]  ? finish_task_switch+0x7d/0x290
>>>>>>> [402750.878252]  ? cifs_handle_standard+0x190/0x190 [cifs]
>>>>>>> [402750.878651]  ? kthread+0xf8/0x130
>>>>>>> [402750.879034]  ? kthread_create_worker_on_cpu+0x70/0x70
>>>>>>> [402750.879412]  ? ret_from_fork+0x35/0x40
>>>>>>> [402750.879786] Modules linked in: cpufreq_conservative(E) cpufreq_powersave(E) cpufreq_userspace(E) arc4(E) ecb(E) md4(E) sha512_ssse3(E) sha512_generic(E) cmac(E) hmac(E) nfsv3(E) nfs_acl(E) nls_utf8(E) nfs(E) lockd(E) grace(E) cifs(E) ccm(E) dns_resolver(E) fscache(E) sb_edac(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) aes_x86_64(E) crypto_simd(E) joydev(E) cryptd(E) vmw_balloon(E) evdev(E) glue_helper(E) serio_raw(E) vmwgfx(E) ttm(E) drm_kms_helper(E) sg(E) drm(E) vmw_vsock_vmci_transport(E) vsock(E) vmw_vmci(E) ac(E) button(E) auth_rpcgss(E) sunrpc(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sd_mod(E) ata_generic(E) ata_piix(E) vmw_pvscsi(E) libata(E) crc32c_intel(E) psmouse(E) vmxnet3(E) i2c_piix4(E) scsi_mod(E)
>>>>>>> [402750.883384] CR2: ffff909bf61609d0
>>>>>>> [402750.883783] ---[ end trace 08b06875e82513eb ]---
>>>>>>> [402750.884200] RIP: 0010:0xffff909bf61609d0
>>>>>>> [402750.884598] Code: ff ff 00 00 00 00 52 01 00 00 00 7d 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <d0> 09 16 f6 9b 90 ff ff d0 09 16 f6 9b 90 ff ff c0 40 1f f2 9b 90
>>>>>>> [402750.885822] RSP: 0000:ffffa18d41adbd28 EFLAGS: 00010202
>>>>>>> [402750.886229] RAX: ffff909bf61609d0 RBX: ffff909bf6c3f800 RCX: dead000000000200
>>>>>>> [402750.886651] RDX: ffff909bf74fa980 RSI: 0000000000000246 RDI: ffff909bf6160998
>>>>>>> [402750.887065] RBP: ffff909bf74fa980 R08: 0000000000000000 R09: ffff909bf6c3f970
>>>>>>> [402750.887478] R10: ffffa18d406a7cc0 R11: 0000000000000000 R12: ffff909bf6c3f9c0
>>>>>>> [402750.887883] R13: ffff909bf74fa980 R14: ffffa18d41adbd40 R15: ffff909bf74fa100
>>>>>>> [402750.888285] FS:  0000000000000000(0000) GS:ffff909bf9c00000(0000) knlGS:0000000000000000
>>>>>>> [402750.888692] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>> [402750.889093] CR2: ffff909bf61609d0 CR3: 0000000136e56004 CR4: 00000000003606f0
>>>>>>> [402750.889559] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>> [402750.889972] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>> [410772.901264] CIFS VFS: Send error in SessSetup = -126
>>>>>>> [497165.194662] CIFS VFS: Send error in SessSetup = -126
>>>>>>> [583557.492304] CIFS VFS: Send error in SessSetup = -126
>>>>>>> [669948.937016] CIFS VFS: Send error in SessSetup = -126
>>>>>>> [756341.390112] CIFS VFS: Send error in SessSetup = -126
>>>>>>> [842733.557002] CIFS VFS: Send error in SessSetup = -126
>>>>>>> [929125.303428] CIFS VFS: Send error in SessSetup = -126
>>>>>>> [1015516.629380] CIFS VFS: Send error in SessSetup = -126
>>>>>>> [1101908.464697] CIFS VFS: Send error in SessSetup = -126
>>>>>>> [1188300.531261] CIFS VFS: Send error in SessSetup = -126
>>>>>>> [1274692.583049] CIFS VFS: Send error in SessSetup = -126
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> The individual stack dumps are pretty useful. Here is my theory:
>>>>>>>>
>>>>>>>>> pid: 9505
>>>>>>>>> syscall: 4 0x56550a2ec470 0x7ffede42e9a0 0x7ffede42e9a0 0x83a 0x3 0x20
>>>>>>>>> 0x7ffede42e8f8 0x7f7f8928f295
>>>>>>>>> [<0>] open_shroot+0x43/0x200 [cifs]
>>>>>>>>> [<0>] smb2_query_path_info+0x93/0x220 [cifs]
>>>>>>>>
>>>>>>>> Almost all of the processes have the same stack trace. They are stuck at
>>>>>>>> open_shroot()+0x43 which is probably
>>>>>>>>
>>>>>>>>         mutex_lock(&tcon->crfid.fid_mutex);
>>>>>>>>
>>>>>>>> Then there are only 2 other processes stuck somewhere in the same code path
>>>>>>>> (open_shroot) but deeper, meaning they have the locks that the other
>>>>>>>> processes are waiting for:
>>>>>>>>
>>>>>>>>
>>>>>>>>> pid: 22858
>>>>>>>>> syscall: 4 0x564b46285d10 0x7ffcea3f9a80 0x7ffcea3f9a80 0x83a 0x3 0x20
>>>>>>>>> 0x7ffcea3f99d8 0x7f6cc78c7295
>>>>>>>>> [<0>] cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
>>>>>>>>> [<0>] smb2_reconnect+0x2d6/0x4b0 [cifs]
>>>>>>>>> [<0>] smb2_plain_req_init+0x30/0x240 [cifs]
>>>>>>>>> [<0>] SMB2_open_init+0x6d/0x7c0 [cifs]
>>>>>>>>> [<0>] SMB2_open+0x150/0x520 [cifs]
>>>>>>>>> [<0>] open_shroot+0x12f/0x200 [cifs]
>>>>>>>>> [<0>] smb2_query_path_info+0x93/0x220 [cifs]
>>>>>>>>> [<0>] cifs_get_inode_info+0x580/0xb10 [cifs]
>>>>>>>>> [<0>] cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
>>>>>>>>> [<0>] cifs_getattr+0x5b/0x1b0 [cifs]
>>>>>>>>> [<0>] vfs_statx+0x89/0xe0
>>>>>>>>> [<0>] __do_sys_newstat+0x39/0x70
>>>>>>>>> [<0>] do_syscall_64+0x55/0x100
>>>>>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>>>>>> [<0>] 0xffffffffffffffff
>>>>>>>>
>>>>>>>>
>>>>>>>>> pid: 20027
>>>>>>>>> syscall: 4 0x55a3c7d767d0 0x7ffe51432ab0 0x7ffe51432ab0 0x83a
>>>>>>>>> 0x55a3c7d75c40 0x20 0x7ffe51432a08 0x7f5f7c4e7295
>>>>>>>>> [<0>] cifs_mark_open_files_invalid+0x54/0xa0 [cifs]
>>>>>>>>> [<0>] smb2_reconnect+0x2d6/0x4b0 [cifs]
>>>>>>>>> [<0>] smb2_plain_req_init+0x30/0x240 [cifs]
>>>>>>>>> [<0>] SMB2_open_init+0x6d/0x7c0 [cifs]
>>>>>>>>> [<0>] SMB2_open+0x150/0x520 [cifs]
>>>>>>>>> [<0>] open_shroot+0x12f/0x200 [cifs]
>>>>>>>>> [<0>] smb2_query_path_info+0x93/0x220 [cifs]
>>>>>>>>> [<0>] cifs_get_inode_info+0x580/0xb10 [cifs]
>>>>>>>>> [<0>] cifs_revalidate_dentry_attr+0xdc/0x3e0 [cifs]
>>>>>>>>> [<0>] cifs_getattr+0x5b/0x1b0 [cifs]
>>>>>>>>> [<0>] vfs_statx+0x89/0xe0
>>>>>>>>> [<0>] __do_sys_newstat+0x39/0x70
>>>>>>>>> [<0>] do_syscall_64+0x55/0x100
>>>>>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>>>>>> [<0>] 0xffffffffffffffff
>>>>>>>>
>>>>>>>> Due to timeouts maybe the Open request needs
>>>>>>>> to reconnect the server/ses/tcon and to do this it calls
>>>>>>>> cifs_mark_open_files_invalid() and gets stuck somewhere there.
>>>>>>>>
>>>>>>>>             spin_lock(&tcon->open_file_lock);
>>>>>>>>             list_for_each_safe(tmp, tmp1, &tcon->openFileList) {
>>>>>>>>                     open_file = list_entry(tmp, struct cifsFileInfo, tlist);
>>>>>>>>                     open_file->invalidHandle = true;
>>>>>>>>                     open_file->oplock_break_cancelled = true;
>>>>>>>>             }
>>>>>>>>             spin_unlock(&tcon->open_file_lock);
>>>>>>>>
>>>>>>>>             mutex_lock(&tcon->crfid.fid_mutex); <=== most likely here
>>>>>>>>             tcon->crfid.is_valid = false;
>>>>>>>>             memset(tcon->crfid.fid, 0, sizeof(struct cifs_fid));
>>>>>>>>             mutex_unlock(&tcon->crfid.fid_mutex);
>>>>>>>>
>>>>>>>> I think these processes are trying to lock the same lock twice: one in
>>>>>>>> open_shroot() and since Open ends up having to reconnect, once again in
>>>>>>>> mark_open_files_invalid(). I think it's the same lock because I don't
>>>>>>>> see why the tcon pointers would be different in those 2 spots. Kernel
>>>>>>>> mutexes are not reentrant so this is a deadlock.
>>>>>>>
>>>>>>> Is there anything we can do about this? Is this maybe already fixed in newer kernels?
>>>>>>>
>>>>>>> Regards, Martijn
>>>>>>>
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Martijn de Gouw
>>>>>>> Designer
>>>>>>> Prodrive Technologies
>>>>>>> Mobile: +31 63 17 76 161
>>>>>>> Phone:  +31 40 26 76 200
>>>>>
>>>>> --
>>>>> Martijn de Gouw
>>>>> Designer
>>>>> Prodrive Technologies
>>>>> Mobile: +31 63 17 76 161
>>>>> Phone:  +31 40 26 76 200
>>>>
>>>>
>>>>
>>>> --
>>>> Thanks,
>>>>
>>>> Steve
>>
>> --
>> Martijn de Gouw
>> Designer
>> Prodrive Technologies
>> Mobile: +31 63 17 76 161
>> Phone:  +31 40 26 76 200

-- 
Martijn de Gouw
Designer
Prodrive Technologies
Mobile: +31 63 17 76 161
Phone:  +31 40 26 76 200 

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-08-07  7:34 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-03 18:12 Many processes end up in uninterruptible sleep accessing cifs mounts Martijn de Gouw
2019-07-03 19:17 ` Aurélien Aptel
2019-07-03 19:24   ` Steve French
     [not found]   ` <1fc4f6d0-6cdc-69a5-4359-23484d6bdfc9@prodrive-technologies.com>
2019-07-04 11:22     ` Aurélien Aptel
2019-07-04 13:36       ` Martijn de Gouw
2019-07-04 21:34         ` Pavel Shilovsky
2019-07-06 11:11           ` Martijn de Gouw
2019-07-06 13:05             ` Steve French
2019-07-06 16:40               ` Pavel Shilovsky
2019-07-09  8:48                 ` Martijn de Gouw
2019-07-15 23:40                   ` Pavel Shilovsky
2019-08-07  7:34                     ` Martijn de Gouw
2019-07-08  9:04               ` Martijn de Gouw
2019-07-08 11:06                 ` Aurélien Aptel
2019-07-08 13:18                   ` Martijn de Gouw
2019-07-04 21:08       ` Pavel Shilovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).