All of lore.kernel.org
 help / color / mirror / Atom feed
* DFS tests failing in buildbot
@ 2022-03-08 11:38 Shyam Prasad N
  2022-03-08 15:26 ` Paulo Alcantara
  0 siblings, 1 reply; 2+ messages in thread
From: Shyam Prasad N @ 2022-03-08 11:38 UTC (permalink / raw)
  To: CIFS, ronnie sahlberg, Pavel Shilovsky, Steve French, Paulo Alcantara

Hi,

Once every few runs, we see the DFS tests failing in buildbot.
I did some digging into this, and here's my conclusion.
Please let me know if you can point out some issue with the root cause
or the fix.

There is a race condition that exists between cifsd and I/O threads
when the tcp connection is broken. The cifsd thread marks the
server/session/tcon structures for reconnect, and recreates the
socket, and sets 1 credit for this server. This only changes after the
next negotiate/session-setup completes, where it can get more credits.
During this window, if any ongoing I/O requires more than 1 credit,
then it will return with smb3_insufficient_credits (note that slightly
earlier in the same code, we identify reconnect with
smb3_reconnect_detected, but do nothing about it). The I/O will now
leak -EHOSTDOWN or -EAGAIN into userspace.

I feel that we should return a special error (-ERESTARTSYS?) when
smb3_reconnect_detected, and use this errno to ask the caller to
restart the syscall.

Ronnie/Pavel/Paulo: Please let me know what you think about this.

-- 
Regards,
Shyam

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: DFS tests failing in buildbot
  2022-03-08 11:38 DFS tests failing in buildbot Shyam Prasad N
@ 2022-03-08 15:26 ` Paulo Alcantara
  0 siblings, 0 replies; 2+ messages in thread
From: Paulo Alcantara @ 2022-03-08 15:26 UTC (permalink / raw)
  To: Shyam Prasad N, CIFS, ronnie sahlberg, Pavel Shilovsky, Steve French

Shyam Prasad N <nspmangalore@gmail.com> writes:

> There is a race condition that exists between cifsd and I/O threads
> when the tcp connection is broken. The cifsd thread marks the
> server/session/tcon structures for reconnect, and recreates the
> socket, and sets 1 credit for this server. This only changes after the
> next negotiate/session-setup completes, where it can get more credits.
> During this window, if any ongoing I/O requires more than 1 credit,
> then it will return with smb3_insufficient_credits (note that slightly
> earlier in the same code, we identify reconnect with
> smb3_reconnect_detected, but do nothing about it). The I/O will now
> leak -EHOSTDOWN or -EAGAIN into userspace.

I don't see why it would be a problem returning either -EAGAIN or
-EHOSTDOWN back to userspace on *soft* mounts.  Isn't this what we want?

If the syscall gets signaled while we are waiting for the tcp connection
being restablished, then we return -ERESTARTSYS.  See
wait_event_interruptible_timeout() in smb2_reconnect().

> I feel that we should return a special error (-ERESTARTSYS?) when
> smb3_reconnect_detected, and use this errno to ask the caller to
> restart the syscall.

Userspace doesn't handle -ERESTARTSYS.  When we return -ERESTARTSYS from
a signaled syscall, this means that the kernel will either handle the
signal and restart syscall from the beginning, or return -EINTR back to
userspace.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-03-08 15:26 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-08 11:38 DFS tests failing in buildbot Shyam Prasad N
2022-03-08 15:26 ` Paulo Alcantara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.