linux-cifs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Frequent reconnections / session startups?
@ 2019-08-26  6:55 James Wettenhall
  2019-08-26 14:55 ` Steve French
  0 siblings, 1 reply; 10+ messages in thread
From: James Wettenhall @ 2019-08-26  6:55 UTC (permalink / raw)
  To: linux-cifs

Hi,

We run a Django / Celery application which makes heavy use of CIFS
mounts.  We are experiencing frequent reconnections / session startups
and would like to understand how to avoid hammering the CIFS server
and/or the authentication server.  We've had multiple reports of
DoS-like hammering from server admins, causing frequent
re-authentication attempts and in one case causing core dumps on the
CIFS server.

Our CIFS client VMs have the following:

OS: Ubuntu 18.04.3
Kernel: 4.15.0-58-generic
mount.cifs: 6.8

Current mount options:
rw,relatime,vers=3.0,sec=ntlmssp,cache=strict,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1

We don't run the CIFS server, but we can request any information
required to diagnose the issue.

Over the past 10 hours, one of our virtual machine's kernel log has accumulated:

8453 kern.log messages including "CIFS"

To break that down, we have:

8305 "Free previous auth_key.response" messages
111 "validate protocol negotiate failed: -11" messages
26 "Close unmatched open" messages
7 "has not responded in 120 seconds" messages
4  "cifs_mount failed w/return code = -11" messages

The server is an HSM (Hierarchical Storage Management) system, so it
can be slow to respond if our application requests a file which is
only available on tape, not on disk.

The most common operation our application is performing on the
CIFS-mounted files is calculating MD5 checksums - with many Celery
worker processes running concurrently.

We would appreciate any advice on how to investigate further.

Thanks,
James

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Frequent reconnections / session startups?
  2019-08-26  6:55 Frequent reconnections / session startups? James Wettenhall
@ 2019-08-26 14:55 ` Steve French
  2019-08-28  1:50   ` James Wettenhall
  2019-09-02  0:23   ` James Wettenhall
  0 siblings, 2 replies; 10+ messages in thread
From: Steve French @ 2019-08-26 14:55 UTC (permalink / raw)
  To: James Wettenhall; +Cc: CIFS

If you think that the disconnects are due to timeouts accessing files
on offline storage you can also try mounting with the "hard" mount
option.  The mount parm "echo_interval" can be also increased to make
it less likely that we give up on an unresponsive server (it defaults
to 60 seconds and can set to maximum of "echo_interval=600" ie 600
seconds).

There were many fixes relating to crediting and reconnection that went
in almost a year ago, but would not be in an older kernel like 4.15
unless Ubuntu backported them.   Fortunately, Ubuntu makes it very
easy to test if the fix is in a newer kernel by installing (as a test)
a newer kernel on your client for doing an experiment like this (see
https://wiki.ubuntu.com/Kernel/MainlineBuilds).

If after installing a more recent mainline kernel as a quick test, if
you don't see the reconnect problem, this would make it easier to ask
Ubuntu to backport the various reconnect fixes marked for stable that
went in late last year (or you could continue to use the more recent
kernel).

Also note that it is possible with dynamic tracing now in cifs.ko to
do easier tracing of reconnect events (or all cifs events "trace-cmd
record -e cifs") which can sometimes help narrow down the cause.
Reconnect statistics are also updated in /proc/fs/cifs/Stats

On Mon, Aug 26, 2019 at 1:57 AM James Wettenhall
<james.wettenhall@monash.edu> wrote:
>
> Hi,
>
> We run a Django / Celery application which makes heavy use of CIFS
> mounts.  We are experiencing frequent reconnections / session startups
> and would like to understand how to avoid hammering the CIFS server
> and/or the authentication server.  We've had multiple reports of
> DoS-like hammering from server admins, causing frequent
> re-authentication attempts and in one case causing core dumps on the
> CIFS server.
>
> Our CIFS client VMs have the following:
>
> OS: Ubuntu 18.04.3
> Kernel: 4.15.0-58-generic
> mount.cifs: 6.8
>
> Current mount options:
> rw,relatime,vers=3.0,sec=ntlmssp,cache=strict,soft,nounix,serverino,mapposix,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1
>
> We don't run the CIFS server, but we can request any information
> required to diagnose the issue.
>
> Over the past 10 hours, one of our virtual machine's kernel log has accumulated:
>
> 8453 kern.log messages including "CIFS"
>
> To break that down, we have:
>
> 8305 "Free previous auth_key.response" messages
> 111 "validate protocol negotiate failed: -11" messages
> 26 "Close unmatched open" messages
> 7 "has not responded in 120 seconds" messages
> 4  "cifs_mount failed w/return code = -11" messages
>
> The server is an HSM (Hierarchical Storage Management) system, so it
> can be slow to respond if our application requests a file which is
> only available on tape, not on disk.
>
> The most common operation our application is performing on the
> CIFS-mounted files is calculating MD5 checksums - with many Celery
> worker processes running concurrently.
>
> We would appreciate any advice on how to investigate further.
>
> Thanks,
> James



-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Frequent reconnections / session startups?
  2019-08-26 14:55 ` Steve French
@ 2019-08-28  1:50   ` James Wettenhall
  2019-09-02  0:23   ` James Wettenhall
  1 sibling, 0 replies; 10+ messages in thread
From: James Wettenhall @ 2019-08-28  1:50 UTC (permalink / raw)
  To: Steve French; +Cc: CIFS

Steve,

I just wanted to say thanks for the quick and detailed response - this
is extremely helpful.

It could take a few days before we can report back on which of these
recommendations was most helpful, given some challenges with
reproducing the problem.

We've been upgrading some VMs to Kernel 5.0 using:

    https://wiki.ubuntu.com/Kernel/LTSEnablementStack

and so far the results look very promising...

Cheers,
James

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Frequent reconnections / session startups?
  2019-08-26 14:55 ` Steve French
  2019-08-28  1:50   ` James Wettenhall
@ 2019-09-02  0:23   ` James Wettenhall
  2019-09-03 10:38     ` Aurélien Aptel
  1 sibling, 1 reply; 10+ messages in thread
From: James Wettenhall @ 2019-09-02  0:23 UTC (permalink / raw)
  To: Steve French; +Cc: CIFS

Hi Steve,

We've been running the newer kernel - upgraded from 4.15.0 to 5.0.0
using Ubuntu 18.04's LTS Enablement Stack - for several days, and it
has certainly solved the frequent reconnection problem, so thanks for
letting me know about the reconnection fixes that went in almost a
year ago.

The only negative we are experiencing since the upgrade is that our
VMs sometimes become unresponsive - appearing to require a reboot -
with kernel messages like this:

[74146.705917] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[74146.716713] rcu:     (detected by 0, t=285034 jiffies, g=15795253, q=11)
[74146.718805] rcu: All QSes seen, last rcu_sched kthread activity
285035 (4313428844-4313143809), jiffies_till_next_fqs=1, root ->qsmask
0x0
[74146.723702] rcu: rcu_sched kthread starved for 285036 jiffies!
g15795253 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
[74146.727649] rcu: RCU grace-period kthread stack dump:
[74160.609964] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [cifsd:2854]
[74172.594002] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [cifsd:2441]

I wonder if this could be a different symptom of the same underlying
problem?  Of course we would prefer not to have production VMs
becoming unresponsive, but maybe this way, we are detecting and
containing the problem earlier, so we can take the required action
(reboot), instead of waiting for our CIFS service providers or
authentication service providers to complain about the frequent
reconnections?

Cheers,
James

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Frequent reconnections / session startups?
  2019-09-02  0:23   ` James Wettenhall
@ 2019-09-03 10:38     ` Aurélien Aptel
  2019-09-04  6:46       ` James Wettenhall
  0 siblings, 1 reply; 10+ messages in thread
From: Aurélien Aptel @ 2019-09-03 10:38 UTC (permalink / raw)
  To: James Wettenhall, Steve French; +Cc: CIFS

"James Wettenhall" <james.wettenhall@monash.edu> writes:
> The only negative we are experiencing since the upgrade is that our
> VMs sometimes become unresponsive - appearing to require a reboot -
> with kernel messages like this:

Are the VMs completely unresponsive or can you run commands in a
separate shell (assuming you're not touching the cifs mount in that shell)?

Does dmesg include a stacktrace you would be wiling to share?

Cheers,
-- 
Aurélien Aptel / SUSE Labs Samba Team
GPG: 1839 CB5F 9F5B FB9B AA97  8C99 03C8 A49B 521B D5D3
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, DE
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 247165 (AG München)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Frequent reconnections / session startups?
  2019-09-03 10:38     ` Aurélien Aptel
@ 2019-09-04  6:46       ` James Wettenhall
  2019-09-13 23:47         ` Pavel Shilovsky
  0 siblings, 1 reply; 10+ messages in thread
From: James Wettenhall @ 2019-09-04  6:46 UTC (permalink / raw)
  To: Aurélien Aptel; +Cc: Steve French, CIFS

Hi Aurélien,

The VMs become completely unresponsive, so we can't run commands in a
separate shell.

I've included a stack trace below.

I'm considering trying the cache=loose mount option.

Cheers,
James

Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.616360] INFO: task
dockerd:786 blocked for more than 120 seconds.
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.621073]       Not
tainted 5.0.0-25-generic #26~18.04.1-Ubuntu
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.625436] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629464] dockerd
D    0   786      1 0x00000000
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629467] Call Trace:
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629477]  __schedule+0x2bd/0x850
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629482]  ?
__switch_to_asm+0x35/0x70
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629484]  schedule+0x2c/0x70
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629485]
schedule_preempt_disabled+0xe/0x10
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629487]
__mutex_lock.isra.9+0x183/0x4e0
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629488]  ?
schedule_timeout+0x171/0x360
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629490]
__mutex_lock_slowpath+0x13/0x20
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629491]  ?
__mutex_lock_slowpath+0x13/0x20
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629492]  mutex_lock+0x2f/0x40
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629528]
smb2_reconnect+0x106/0x7f0 [cifs]
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629531]  ? __switch_to+0x123/0x4e0
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629533]  ?
__switch_to_asm+0x35/0x70
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629537]  ?
__switch_to_asm+0x41/0x70
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629540]  ? wait_woken+0x80/0x80
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629556]
smb2_plain_req_init+0x34/0x270 [cifs]
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629577]
SMB2_open_init+0x6d/0x730 [cifs]
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629595]
SMB2_open+0x148/0x4f0 [cifs]
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629609]  ?
SMB2_open+0x148/0x4f0 [cifs]
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629624]
open_shroot+0x16c/0x210 [cifs]
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629637]  ?
open_shroot+0x16c/0x210 [cifs]
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629654]
smb2_query_path_info+0x11c/0x1b0 [cifs]
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629656]  ? _cond_resched+0x19/0x40
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629660]  ?
kmem_cache_alloc_trace+0x151/0x1c0
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629673]
cifs_get_inode_info+0x3e3/0xb70 [cifs]
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629685]  ?
build_path_from_dentry_optional_prefix+0x103/0x430 [cifs]
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629699]
cifs_revalidate_dentry_attr+0xe9/0x3d0 [cifs]
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629712]
cifs_getattr+0x5d/0x1a0 [cifs]
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629715]  ?
common_perm_cond+0x4c/0x70
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629719]
vfs_getattr_nosec+0x73/0x90
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629720]  vfs_getattr+0x36/0x40
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629721]  vfs_statx+0x8d/0xe0
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629723]
__do_sys_newlstat+0x3d/0x70
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629725]
__x64_sys_newlstat+0x16/0x20
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629729]  do_syscall_64+0x5a/0x120
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629731]
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629734] RIP: 0033:0x55fd5a4b1e40
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629739] Code: Bad RIP value.
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629741] RSP:
002b:000000c421af6948 EFLAGS: 00000212 ORIG_RAX: 0000000000000006
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629742] RAX:
ffffffffffffffda RBX: 0000000000000000 RCX: 000055fd5a4b1e40
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629743] RDX:
0000000000000000 RSI: 000000c421491488 RDI: 000000c4227a4060
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629743] RBP:
000000c421af69b0 R08: 0000000000000000 R09: 0000000000000000
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629744] R10:
0000000000000000 R11: 0000000000000212 R12: ffffffffffffffff
Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629745] R13:
0000000000000002 R14: 0000000000000001 R15: 0000000000000055

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Frequent reconnections / session startups?
  2019-09-04  6:46       ` James Wettenhall
@ 2019-09-13 23:47         ` Pavel Shilovsky
       [not found]           ` <CAE78Er97k7O-GDGdMtp0qXtQ-q-1nS_d1AE6HHH+Kz6PV_G2uQ@mail.gmail.com>
  0 siblings, 1 reply; 10+ messages in thread
From: Pavel Shilovsky @ 2019-09-13 23:47 UTC (permalink / raw)
  To: James Wettenhall; +Cc: Aurélien Aptel, Steve French, CIFS

Hi James,

Thanks for providing this information.

The 5.0 kernel has the known bug when handling cached root handle
which may cause kernel to stuck like in your case.

In order to work around the problem for you, please mount with
"nohandlecache" mount option. This will turn off caching of the root
handle in the CIFS module and the problematic code path won't be
executed.

Please let us know if this solves the problem for you.

--
Best regards,
Pavel Shilovsky

вт, 3 сент. 2019 г. в 23:47, James Wettenhall <james.wettenhall@monash.edu>:
>
> Hi Aurélien,
>
> The VMs become completely unresponsive, so we can't run commands in a
> separate shell.
>
> I've included a stack trace below.
>
> I'm considering trying the cache=loose mount option.
>
> Cheers,
> James
>
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.616360] INFO: task
> dockerd:786 blocked for more than 120 seconds.
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.621073]       Not
> tainted 5.0.0-25-generic #26~18.04.1-Ubuntu
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.625436] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629464] dockerd
> D    0   786      1 0x00000000
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629467] Call Trace:
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629477]  __schedule+0x2bd/0x850
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629482]  ?
> __switch_to_asm+0x35/0x70
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629484]  schedule+0x2c/0x70
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629485]
> schedule_preempt_disabled+0xe/0x10
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629487]
> __mutex_lock.isra.9+0x183/0x4e0
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629488]  ?
> schedule_timeout+0x171/0x360
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629490]
> __mutex_lock_slowpath+0x13/0x20
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629491]  ?
> __mutex_lock_slowpath+0x13/0x20
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629492]  mutex_lock+0x2f/0x40
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629528]
> smb2_reconnect+0x106/0x7f0 [cifs]
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629531]  ? __switch_to+0x123/0x4e0
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629533]  ?
> __switch_to_asm+0x35/0x70
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629537]  ?
> __switch_to_asm+0x41/0x70
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629540]  ? wait_woken+0x80/0x80
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629556]
> smb2_plain_req_init+0x34/0x270 [cifs]
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629577]
> SMB2_open_init+0x6d/0x730 [cifs]
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629595]
> SMB2_open+0x148/0x4f0 [cifs]
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629609]  ?
> SMB2_open+0x148/0x4f0 [cifs]
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629624]
> open_shroot+0x16c/0x210 [cifs]
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629637]  ?
> open_shroot+0x16c/0x210 [cifs]
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629654]
> smb2_query_path_info+0x11c/0x1b0 [cifs]
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629656]  ? _cond_resched+0x19/0x40
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629660]  ?
> kmem_cache_alloc_trace+0x151/0x1c0
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629673]
> cifs_get_inode_info+0x3e3/0xb70 [cifs]
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629685]  ?
> build_path_from_dentry_optional_prefix+0x103/0x430 [cifs]
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629699]
> cifs_revalidate_dentry_attr+0xe9/0x3d0 [cifs]
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629712]
> cifs_getattr+0x5d/0x1a0 [cifs]
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629715]  ?
> common_perm_cond+0x4c/0x70
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629719]
> vfs_getattr_nosec+0x73/0x90
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629720]  vfs_getattr+0x36/0x40
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629721]  vfs_statx+0x8d/0xe0
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629723]
> __do_sys_newlstat+0x3d/0x70
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629725]
> __x64_sys_newlstat+0x16/0x20
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629729]  do_syscall_64+0x5a/0x120
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629731]
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629734] RIP: 0033:0x55fd5a4b1e40
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629739] Code: Bad RIP value.
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629741] RSP:
> 002b:000000c421af6948 EFLAGS: 00000212 ORIG_RAX: 0000000000000006
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629742] RAX:
> ffffffffffffffda RBX: 0000000000000000 RCX: 000055fd5a4b1e40
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629743] RDX:
> 0000000000000000 RSI: 000000c421491488 RDI: 000000c4227a4060
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629743] RBP:
> 000000c421af69b0 R08: 0000000000000000 R09: 0000000000000000
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629744] R10:
> 0000000000000000 R11: 0000000000000212 R12: ffffffffffffffff
> Sep  4 13:36:36 prod-worker-1a kernel: [ 3384.629745] R13:
> 0000000000000002 R14: 0000000000000001 R15: 0000000000000055

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Frequent reconnections / session startups?
       [not found]           ` <CAE78Er97k7O-GDGdMtp0qXtQ-q-1nS_d1AE6HHH+Kz6PV_G2uQ@mail.gmail.com>
@ 2019-09-18  5:23             ` James Wettenhall
  2019-09-18  6:49               ` ronnie sahlberg
  0 siblings, 1 reply; 10+ messages in thread
From: James Wettenhall @ 2019-09-18  5:23 UTC (permalink / raw)
  To: Pavel Shilovsky; +Cc: Aurélien Aptel, Steve French, CIFS

Thanks Pavel,

We've been running Kernel v5.2.14 over the past week (updated using
Ukuu) and it seems to have improved the situation considerably.

I assume that the "nohandlecache" mount option recommendation was just for v5.0.

Cheers,
James

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Frequent reconnections / session startups?
  2019-09-18  5:23             ` James Wettenhall
@ 2019-09-18  6:49               ` ronnie sahlberg
  2019-09-18 17:58                 ` Pavel Shilovsky
  0 siblings, 1 reply; 10+ messages in thread
From: ronnie sahlberg @ 2019-09-18  6:49 UTC (permalink / raw)
  To: James Wettenhall; +Cc: Pavel Shilovsky, Aurélien Aptel, Steve French, CIFS

On Wed, Sep 18, 2019 at 4:16 PM James Wettenhall
<james.wettenhall@monash.edu> wrote:
>
> Thanks Pavel,
>
> We've been running Kernel v5.2.14 over the past week (updated using
> Ukuu) and it seems to have improved the situation considerably.

Thank you for the feedback.
This is very good news.


>
> I assume that the "nohandlecache" mount option recommendation was just for v5.0.
>
> Cheers,
> James

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Frequent reconnections / session startups?
  2019-09-18  6:49               ` ronnie sahlberg
@ 2019-09-18 17:58                 ` Pavel Shilovsky
  0 siblings, 0 replies; 10+ messages in thread
From: Pavel Shilovsky @ 2019-09-18 17:58 UTC (permalink / raw)
  To: ronnie sahlberg; +Cc: James Wettenhall, Aurélien Aptel, Steve French, CIFS

вт, 17 сент. 2019 г. в 23:49, ronnie sahlberg <ronniesahlberg@gmail.com>:
>
> On Wed, Sep 18, 2019 at 4:16 PM James Wettenhall
> <james.wettenhall@monash.edu> wrote:
> >
> > Thanks Pavel,
> >
> > We've been running Kernel v5.2.14 over the past week (updated using
> > Ukuu) and it seems to have improved the situation considerably.
>
> Thank you for the feedback.
> This is very good news.
>
>
> >
> > I assume that the "nohandlecache" mount option recommendation was just for v5.0.

Glad to know that the situation have improved for your workload.
Thanks for the feedback.

v5.2 kernel has many fixes preventing reconnects that's probably why
you stopped observing the original problem. The latter haven't been
completely fixed yet in v5.2.y yet. We have a patch in for-next that
aims to fix it but it haven't been sent to the mainline yet, see

https://git.samba.org/?p=sfrench/cifs-2.6.git;a=commitdiff;h=96d9f7ed00b86104bf03adeffc8980897e9694ab.

Once it is there, it should be automatically picked up for backporting
to all active stable kernels it applies to.

In the meantime, If you start hitting the issue again, please try
mount option "nohandlecache" as a workaround.

--
Best regards,
Pavel Shilovsky

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-09-18 17:58 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-26  6:55 Frequent reconnections / session startups? James Wettenhall
2019-08-26 14:55 ` Steve French
2019-08-28  1:50   ` James Wettenhall
2019-09-02  0:23   ` James Wettenhall
2019-09-03 10:38     ` Aurélien Aptel
2019-09-04  6:46       ` James Wettenhall
2019-09-13 23:47         ` Pavel Shilovsky
     [not found]           ` <CAE78Er97k7O-GDGdMtp0qXtQ-q-1nS_d1AE6HHH+Kz6PV_G2uQ@mail.gmail.com>
2019-09-18  5:23             ` James Wettenhall
2019-09-18  6:49               ` ronnie sahlberg
2019-09-18 17:58                 ` Pavel Shilovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).