A process killed while opening a file can result in leaked open handle on the server

linux-cifs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* A process killed while opening a file can result in leaked open handle on the server
@ 2019-11-12 19:34 Frank Sorenson
  2019-11-13  2:37 ` Pavel Shilovsky
  2019-11-13  3:28 ` Ronnie Sahlberg
  0 siblings, 2 replies; 10+ messages in thread
From: Frank Sorenson @ 2019-11-12 19:34 UTC (permalink / raw)
  To: linux-cifs

If a process opening a file is killed while waiting for a SMB2_CREATE
response from the server, the response may not be handled by the client,
leaking an open file handle on the server.

This can be reproduced with the following:

# mount //vm3/user1 /mnt/vm3 -overs=3,sec=ntlmssp,credentials=/root/.user1_smb_creds
# cd /mnt/vm3
# echo foo > foo

# for i in {1..100} ; do cat foo >/dev/null 2>&1 & sleep 0.0001 ; kill -9 $! ; done

(increase count if necessary--100 appears sufficient to cause multiple leaked file handles)


The client will stop waiting for the response, and will output
the following when the response does arrive:

    CIFS VFS: Close unmatched open

on the server side, an open file handle is leaked.  If using samba,
the following will show these open files:


# smbstatus | grep -i Locked -A1000
Locked files:
Pid          Uid        DenyMode   Access      R/W        Oplock           SharePath   Name   Time
--------------------------------------------------------------------------------------------------
25936        501        DENY_NONE  0x80        RDONLY     NONE             /home/user1   .   Tue Nov 12 12:29:24 2019
25936        501        DENY_NONE  0x80        RDONLY     NONE             /home/user1   .   Tue Nov 12 12:29:24 2019
25936        501        DENY_NONE  0x120089    RDONLY     LEASE(RWH)       /home/user1   foo   Tue Nov 12 12:29:24 2019
25936        501        DENY_NONE  0x120089    RDONLY     LEASE(RWH)       /home/user1   foo   Tue Nov 12 12:29:24 2019
25936        501        DENY_NONE  0x120089    RDONLY     LEASE(RWH)       /home/user1   foo   Tue Nov 12 12:29:24 2019
25936        501        DENY_NONE  0x120089    RDONLY     LEASE(RWH)       /home/user1   foo   Tue Nov 12 12:29:24 2019
25936        501        DENY_NONE  0x120089    RDONLY     LEASE(RWH)       /home/user1   foo   Tue Nov 12 12:29:24 2019


(also tracked https://bugzilla.redhat.com/show_bug.cgi?id=1771691)

Frank
--
Frank Sorenson
sorenson@redhat.com
Principal Software Maintenance Engineer
Global Support Services - filesystems
Red Hat


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: A process killed while opening a file can result in leaked open handle on the server
  2019-11-12 19:34 A process killed while opening a file can result in leaked open handle on the server Frank Sorenson
@ 2019-11-13  2:37 ` Pavel Shilovsky
  2019-11-13  5:19   ` Ronnie Sahlberg
  2019-11-13  3:28 ` Ronnie Sahlberg
  1 sibling, 1 reply; 10+ messages in thread
From: Pavel Shilovsky @ 2019-11-13  2:37 UTC (permalink / raw)
  To: Frank Sorenson; +Cc: linux-cifs

[-- Attachment #1: Type: text/plain, Size: 3721 bytes --]

вт, 12 нояб. 2019 г. в 11:34, Frank Sorenson <sorenson@redhat.com>:
>
> If a process opening a file is killed while waiting for a SMB2_CREATE
> response from the server, the response may not be handled by the client,
> leaking an open file handle on the server.
>

Thanks for reporting the problem.

> This can be reproduced with the following:
>
> # mount //vm3/user1 /mnt/vm3 -overs=3,sec=ntlmssp,credentials=/root/.user1_smb_creds
> # cd /mnt/vm3
> # echo foo > foo
>
> # for i in {1..100} ; do cat foo >/dev/null 2>&1 & sleep 0.0001 ; kill -9 $! ; done
>
> (increase count if necessary--100 appears sufficient to cause multiple leaked file handles)
>

This is a known problem and the client handles it by closing unmatched
opens (the one that don't have a corresponding waiting thread)
immediately. It is indicated by the message you observed: "Close
unmatched open".

>
> The client will stop waiting for the response, and will output
> the following when the response does arrive:
>
>     CIFS VFS: Close unmatched open
>
> on the server side, an open file handle is leaked.  If using samba,
> the following will show these open files:
>
>
> # smbstatus | grep -i Locked -A1000
> Locked files:
> Pid          Uid        DenyMode   Access      R/W        Oplock           SharePath   Name   Time
> --------------------------------------------------------------------------------------------------
> 25936        501        DENY_NONE  0x80        RDONLY     NONE             /home/user1   .   Tue Nov 12 12:29:24 2019
> 25936        501        DENY_NONE  0x80        RDONLY     NONE             /home/user1   .   Tue Nov 12 12:29:24 2019
> 25936        501        DENY_NONE  0x120089    RDONLY     LEASE(RWH)       /home/user1   foo   Tue Nov 12 12:29:24 2019
> 25936        501        DENY_NONE  0x120089    RDONLY     LEASE(RWH)       /home/user1   foo   Tue Nov 12 12:29:24 2019
> 25936        501        DENY_NONE  0x120089    RDONLY     LEASE(RWH)       /home/user1   foo   Tue Nov 12 12:29:24 2019
> 25936        501        DENY_NONE  0x120089    RDONLY     LEASE(RWH)       /home/user1   foo   Tue Nov 12 12:29:24 2019
> 25936        501        DENY_NONE  0x120089    RDONLY     LEASE(RWH)       /home/user1   foo   Tue Nov 12 12:29:24 2019
>

I tried it locally myself, it seems that we have another issue related
to interrupted close requests:

[1656476.740311] fs/cifs/file.c: Flush inode 0000000078729371 file
000000004d5f9348 rc 0
[1656476.740313] fs/cifs/file.c: closing last open instance for inode
0000000078729371
[1656476.740314] fs/cifs/file.c: CIFS VFS: in _cifsFileInfo_put as
Xid: 457 with uid: 1000
[1656476.740315] fs/cifs/smb2pdu.c: Close
[1656476.740317] fs/cifs/transport.c: signal is pending before sending any data

This will return -512 error to the upper layer but we do not save a
problematic handle somewhere to close it later. op->release() error
code is not being checked by VFS anyway.

We should do the similar thing as we do today for cancelled mids:
allocate "struct close_cancelled_open" and queue the lazy work to
close the handle.

I attached the patch implementing this idea. The patch handles the
interrupt errors in smb2_close_file which is called during close
system call. It fixed the problem in my setup. In the same time I
think I should probably move the handling to PDU layer function
SMB2_close() to handle *all* interrupted close requests including ones
closing temporary handles. I am wondering if anyone has thoughts about
it. Anyway, review of the patch is highly appreciated.

@Frank, could you please test the patch in your environment?

--
Best regards,
Pavel Shilovsky

[-- Attachment #2: 0001-CIFS-Close-open-handle-after-interrupted-close.patch --]
[-- Type: application/octet-stream, Size: 4120 bytes --]

From 7421925b2e7ee0d47eebfb291268784078ea8812 Mon Sep 17 00:00:00 2001
From: Pavel Shilovsky <pshilov@microsoft.com>
Date: Tue, 12 Nov 2019 16:38:13 -0800
Subject: [PATCH] CIFS: Close open handle after interrupted close

Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
---
 fs/cifs/smb2misc.c  | 59 ++++++++++++++++++++++++++++++++++-----------
 fs/cifs/smb2ops.c   | 11 ++++++++-
 fs/cifs/smb2proto.h |  3 +++
 3 files changed, 58 insertions(+), 15 deletions(-)

diff --git a/fs/cifs/smb2misc.c b/fs/cifs/smb2misc.c
index 527c9efd3de0..80a8f4b2daab 100644
--- a/fs/cifs/smb2misc.c
+++ b/fs/cifs/smb2misc.c
@@ -725,36 +725,67 @@ smb2_cancelled_close_fid(struct work_struct *work)
 	kfree(cancelled);
 }
 
+/* Caller should already has an extra reference to @tcon */
+static int
+__smb2_handle_cancelled_close(struct cifs_tcon *tcon, __u64 persistent_fid,
+			      __u64 volatile_fid)
+{
+	struct close_cancelled_open *cancelled;
+
+	cancelled = kzalloc(sizeof(*cancelled), GFP_KERNEL);
+	if (!cancelled)
+		return -ENOMEM;
+
+	cancelled->fid.persistent_fid = persistent_fid;
+	cancelled->fid.volatile_fid = volatile_fid;
+	cancelled->tcon = tcon;
+	INIT_WORK(&cancelled->work, smb2_cancelled_close_fid);
+	WARN_ON(queue_work(cifsiod_wq, &cancelled->work) == false);
+
+	return 0;
+}
+
+int
+smb2_handle_cancelled_close(struct cifs_tcon *tcon, __u64 persistent_fid,
+			    __u64 volatile_fid)
+{
+	int rc;
+
+	cifs_dbg(FYI, "%s: tc_count=%d\n", __func__, tcon->tc_count);
+	spin_lock(&cifs_tcp_ses_lock);
+	tcon->tc_count++;
+	spin_unlock(&cifs_tcp_ses_lock);
+
+	rc = __smb2_handle_cancelled_close(tcon, persistent_fid, volatile_fid);
+	if (rc)
+		cifs_put_tcon(tcon);
+
+	return rc;
+}
+
 int
 smb2_handle_cancelled_mid(char *buffer, struct TCP_Server_Info *server)
 {
 	struct smb2_sync_hdr *sync_hdr = (struct smb2_sync_hdr *)buffer;
 	struct smb2_create_rsp *rsp = (struct smb2_create_rsp *)buffer;
 	struct cifs_tcon *tcon;
-	struct close_cancelled_open *cancelled;
+	int rc;
 
 	if (sync_hdr->Command != SMB2_CREATE ||
 	    sync_hdr->Status != STATUS_SUCCESS)
 		return 0;
 
-	cancelled = kzalloc(sizeof(*cancelled), GFP_KERNEL);
-	if (!cancelled)
-		return -ENOMEM;
-
 	tcon = smb2_find_smb_tcon(server, sync_hdr->SessionId,
 				  sync_hdr->TreeId);
-	if (!tcon) {
-		kfree(cancelled);
+	if (!tcon)
 		return -ENOENT;
-	}
 
-	cancelled->fid.persistent_fid = rsp->PersistentFileId;
-	cancelled->fid.volatile_fid = rsp->VolatileFileId;
-	cancelled->tcon = tcon;
-	INIT_WORK(&cancelled->work, smb2_cancelled_close_fid);
-	queue_work(cifsiod_wq, &cancelled->work);
+	rc = __smb2_handle_cancelled_close(tcon, rsp->PersistentFileId,
+					   rsp->VolatileFileId);
+	if (rc)
+		cifs_put_tcon(tcon);
 
-	return 0;
+	return rc;
 }
 
 /**
diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
index 3bbb65e58dd6..4f47800ee279 100644
--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -1316,7 +1316,16 @@ static void
 smb2_close_file(const unsigned int xid, struct cifs_tcon *tcon,
 		struct cifs_fid *fid)
 {
-	SMB2_close(xid, tcon, fid->persistent_fid, fid->volatile_fid);
+	int rc;
+
+	rc = SMB2_close(xid, tcon, fid->persistent_fid, fid->volatile_fid);
+	if (is_interrupt_error(rc)) {
+		rc = smb2_handle_cancelled_close(tcon, fid->persistent_fid,
+						 fid->volatile_fid);
+		if (rc)
+			cifs_dbg(VFS, "close handle 0x%llx returned error %d\n",
+				 fid->persistent_fid, rc);
+	}
 }
 
 static int
diff --git a/fs/cifs/smb2proto.h b/fs/cifs/smb2proto.h
index 07ca72486cfa..e239f98093a9 100644
--- a/fs/cifs/smb2proto.h
+++ b/fs/cifs/smb2proto.h
@@ -203,6 +203,9 @@ extern int SMB2_set_compression(const unsigned int xid, struct cifs_tcon *tcon,
 extern int SMB2_oplock_break(const unsigned int xid, struct cifs_tcon *tcon,
 			     const u64 persistent_fid, const u64 volatile_fid,
 			     const __u8 oplock_level);
+extern int smb2_handle_cancelled_close(struct cifs_tcon *tcon,
+				       __u64 persistent_fid,
+				       __u64 volatile_fid);
 extern int smb2_handle_cancelled_mid(char *buffer,
 					struct TCP_Server_Info *server);
 void smb2_cancelled_close_fid(struct work_struct *work);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: A process killed while opening a file can result in leaked open handle on the server
  2019-11-12 19:34 A process killed while opening a file can result in leaked open handle on the server Frank Sorenson
  2019-11-13  2:37 ` Pavel Shilovsky
@ 2019-11-13  3:28 ` Ronnie Sahlberg
  1 sibling, 0 replies; 10+ messages in thread
From: Ronnie Sahlberg @ 2019-11-13  3:28 UTC (permalink / raw)
  To: Frank Sorenson; +Cc: linux-cifs

[-- Attachment #1: Type: text/plain, Size: 3309 bytes --]





----- Original Message -----
> From: "Frank Sorenson" <sorenson@redhat.com>
> To: "linux-cifs" <linux-cifs@vger.kernel.org>
> Sent: Wednesday, 13 November, 2019 5:34:27 AM
> Subject: A process killed while opening a file can result in leaked open handle on the server
> 
> If a process opening a file is killed while waiting for a SMB2_CREATE
> response from the server, the response may not be handled by the client,
> leaking an open file handle on the server.
> 
> This can be reproduced with the following:
> 
> # mount //vm3/user1 /mnt/vm3
> -overs=3,sec=ntlmssp,credentials=/root/.user1_smb_creds
> # cd /mnt/vm3
> # echo foo > foo
> 
> # for i in {1..100} ; do cat foo >/dev/null 2>&1 & sleep 0.0001 ; kill -9 $!
> ; done
> 
> (increase count if necessary--100 appears sufficient to cause multiple leaked
> file handles)
> 
> 
> The client will stop waiting for the response, and will output
> the following when the response does arrive:
> 
>     CIFS VFS: Close unmatched open
> 
> on the server side, an open file handle is leaked.  If using samba,
> the following will show these open files:
> 
> 
> # smbstatus | grep -i Locked -A1000
> Locked files:
> Pid          Uid        DenyMode   Access      R/W        Oplock
> SharePath   Name   Time
> --------------------------------------------------------------------------------------------------
> 25936        501        DENY_NONE  0x80        RDONLY     NONE
> /home/user1   .   Tue Nov 12 12:29:24 2019
> 25936        501        DENY_NONE  0x80        RDONLY     NONE
> /home/user1   .   Tue Nov 12 12:29:24 2019
> 25936        501        DENY_NONE  0x120089    RDONLY     LEASE(RWH)
> /home/user1   foo   Tue Nov 12 12:29:24 2019
> 25936        501        DENY_NONE  0x120089    RDONLY     LEASE(RWH)
> /home/user1   foo   Tue Nov 12 12:29:24 2019
> 25936        501        DENY_NONE  0x120089    RDONLY     LEASE(RWH)
> /home/user1   foo   Tue Nov 12 12:29:24 2019
> 25936        501        DENY_NONE  0x120089    RDONLY     LEASE(RWH)
> /home/user1   foo   Tue Nov 12 12:29:24 2019
> 25936        501        DENY_NONE  0x120089    RDONLY     LEASE(RWH)
> /home/user1   foo   Tue Nov 12 12:29:24 2019
> 
> 
> (also tracked https://bugzilla.redhat.com/show_bug.cgi?id=1771691)
> 

I don't think it has to do with "CIFS VFS: Close unmatched open", in fact those are probably the times
where it did not leak. Unless the SMB2_close() fails with an error.
Can you try the patch below and check that these SMB2_close() are successful?

I think it is more likely that we leak because we can not match the SMB2_Create() response with a MID.
Do you get "No task to wake, unknown frame received" in the log?

We need to do two things here.
1, Find out why we can not match this with a MID and second,
   This it to fix THIS bug.
2, In the demultiplex_thread, where we get a "No task to wake, unknown frame received" we should check
   is this an Open? if so we should invoke server->ops->handle_cancelled_mid()
   This is to make sure other similar bugs do not cause leaks like this.

regards
ronnie sahlberg


> Frank
> --
> Frank Sorenson
> sorenson@redhat.com
> Principal Software Maintenance Engineer
> Global Support Services - filesystems
> Red Hat
> 
> 

[-- Attachment #2: patch --]
[-- Type: text/x-patch, Size: 1575 bytes --]

diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index d78bfcc19156..10cb4d4fb122 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -1548,6 +1548,7 @@ struct mid_q_entry {
 };
 
 struct close_cancelled_open {
+	__u64 mid;
 	struct cifs_fid         fid;
 	struct cifs_tcon        *tcon;
 	struct work_struct      work;
diff --git a/fs/cifs/smb2misc.c b/fs/cifs/smb2misc.c
index e311f58dc1c8..6120c307d88e 100644
--- a/fs/cifs/smb2misc.c
+++ b/fs/cifs/smb2misc.c
@@ -735,12 +735,19 @@ smb2_cancelled_close_fid(struct work_struct *work)
 {
 	struct close_cancelled_open *cancelled = container_of(work,
 					struct close_cancelled_open, work);
+	struct cifs_tcon *tcon = cancelled->tcon;
+	int rc;
 
-	cifs_dbg(VFS, "Close unmatched open\n");
+	cifs_tcon_dbg(VFS, "Close unmatched open for MID:%llx\n",
+		      cancelled->mid);
 
-	SMB2_close(0, cancelled->tcon, cancelled->fid.persistent_fid,
-		   cancelled->fid.volatile_fid);
-	cifs_put_tcon(cancelled->tcon);
+	rc = SMB2_close(0, tcon, cancelled->fid.persistent_fid,
+			cancelled->fid.volatile_fid);
+	if (rc) {
+		cifs_tcon_dbg(VFS, "Close cancelled mid failed with rc:%d\n", rc);
+	}
+
+	cifs_put_tcon(tcon);
 	kfree(cancelled);
 }
 
@@ -770,6 +777,7 @@ smb2_handle_cancelled_mid(char *buffer, struct TCP_Server_Info *server)
 	cancelled->fid.persistent_fid = rsp->PersistentFileId;
 	cancelled->fid.volatile_fid = rsp->VolatileFileId;
 	cancelled->tcon = tcon;
+	cancelled->mid = sync_hdr->MessageId;
 	INIT_WORK(&cancelled->work, smb2_cancelled_close_fid);
 	queue_work(cifsiod_wq, &cancelled->work);
 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: A process killed while opening a file can result in leaked open handle on the server
  2019-11-13  2:37 ` Pavel Shilovsky
@ 2019-11-13  5:19   ` Ronnie Sahlberg
  2019-11-13  6:49     ` Ronnie Sahlberg
  0 siblings, 1 reply; 10+ messages in thread
From: Ronnie Sahlberg @ 2019-11-13  5:19 UTC (permalink / raw)
  To: Pavel Shilovsky; +Cc: Frank Sorenson, linux-cifs





----- Original Message -----
> From: "Pavel Shilovsky" <piastryyy@gmail.com>
> To: "Frank Sorenson" <sorenson@redhat.com>
> Cc: "linux-cifs" <linux-cifs@vger.kernel.org>
> Sent: Wednesday, 13 November, 2019 12:37:38 PM
> Subject: Re: A process killed while opening a file can result in leaked open handle on the server
> 
> вт, 12 нояб. 2019 г. в 11:34, Frank Sorenson <sorenson@redhat.com>:
> >
> > If a process opening a file is killed while waiting for a SMB2_CREATE
> > response from the server, the response may not be handled by the client,
> > leaking an open file handle on the server.
> >
> 
> Thanks for reporting the problem.
> 
> > This can be reproduced with the following:
> >
> > # mount //vm3/user1 /mnt/vm3
> > -overs=3,sec=ntlmssp,credentials=/root/.user1_smb_creds
> > # cd /mnt/vm3
> > # echo foo > foo
> >
> > # for i in {1..100} ; do cat foo >/dev/null 2>&1 & sleep 0.0001 ; kill -9
> > $! ; done
> >
> > (increase count if necessary--100 appears sufficient to cause multiple
> > leaked file handles)
> >
> 
> This is a known problem and the client handles it by closing unmatched
> opens (the one that don't have a corresponding waiting thread)
> immediately. It is indicated by the message you observed: "Close
> unmatched open".
> 
> >
> > The client will stop waiting for the response, and will output
> > the following when the response does arrive:
> >
> >     CIFS VFS: Close unmatched open
> >
> > on the server side, an open file handle is leaked.  If using samba,
> > the following will show these open files:
> >
> >
> > # smbstatus | grep -i Locked -A1000
> > Locked files:
> > Pid          Uid        DenyMode   Access      R/W        Oplock
> > SharePath   Name   Time
> > --------------------------------------------------------------------------------------------------
> > 25936        501        DENY_NONE  0x80        RDONLY     NONE
> > /home/user1   .   Tue Nov 12 12:29:24 2019
> > 25936        501        DENY_NONE  0x80        RDONLY     NONE
> > /home/user1   .   Tue Nov 12 12:29:24 2019
> > 25936        501        DENY_NONE  0x120089    RDONLY     LEASE(RWH)
> > /home/user1   foo   Tue Nov 12 12:29:24 2019
> > 25936        501        DENY_NONE  0x120089    RDONLY     LEASE(RWH)
> > /home/user1   foo   Tue Nov 12 12:29:24 2019
> > 25936        501        DENY_NONE  0x120089    RDONLY     LEASE(RWH)
> > /home/user1   foo   Tue Nov 12 12:29:24 2019
> > 25936        501        DENY_NONE  0x120089    RDONLY     LEASE(RWH)
> > /home/user1   foo   Tue Nov 12 12:29:24 2019
> > 25936        501        DENY_NONE  0x120089    RDONLY     LEASE(RWH)
> > /home/user1   foo   Tue Nov 12 12:29:24 2019
> >
> 
> I tried it locally myself, it seems that we have another issue related
> to interrupted close requests:
> 
> [1656476.740311] fs/cifs/file.c: Flush inode 0000000078729371 file
> 000000004d5f9348 rc 0
> [1656476.740313] fs/cifs/file.c: closing last open instance for inode
> 0000000078729371
> [1656476.740314] fs/cifs/file.c: CIFS VFS: in _cifsFileInfo_put as
> Xid: 457 with uid: 1000
> [1656476.740315] fs/cifs/smb2pdu.c: Close
> [1656476.740317] fs/cifs/transport.c: signal is pending before sending any
> data
> 
> This will return -512 error to the upper layer but we do not save a
> problematic handle somewhere to close it later. op->release() error
> code is not being checked by VFS anyway.
> 
> We should do the similar thing as we do today for cancelled mids:
> allocate "struct close_cancelled_open" and queue the lazy work to
> close the handle.
> 
> I attached the patch implementing this idea. The patch handles the
> interrupt errors in smb2_close_file which is called during close
> system call. It fixed the problem in my setup. In the same time I
> think I should probably move the handling to PDU layer function
> SMB2_close() to handle *all* interrupted close requests including ones
> closing temporary handles. I am wondering if anyone has thoughts about
> it. Anyway, review of the patch is highly appreciated.

I think your patch makes it better. I seem Close() that your patch now fixes the leak for.


But there is still a leak in Open().
I just got one instance where I leaked one handle.

The log shows :
CIFS VFS: \\192.168.124.207 Cancelling wait for mid 25 cmd: 5

and the wireshark shows an Open() request/response for this MID but we never
send a Close() for the handle.



> 
> @Frank, could you please test the patch in your environment?
> 
> --
> Best regards,
> Pavel Shilovsky
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: A process killed while opening a file can result in leaked open handle on the server
  2019-11-13  5:19   ` Ronnie Sahlberg
@ 2019-11-13  6:49     ` Ronnie Sahlberg
  2019-11-13 22:15       ` Frank Sorenson
  0 siblings, 1 reply; 10+ messages in thread
From: Ronnie Sahlberg @ 2019-11-13  6:49 UTC (permalink / raw)
  To: Pavel Shilovsky; +Cc: Frank Sorenson, linux-cifs

[-- Attachment #1: Type: text/plain, Size: 6332 bytes --]

Steve, Pavel

This patch goes ontop of Pavels patch.
Maybe it should be merged with Pavels patch since his patch changes from "we only send a close() on an interrupted open()"
to now "we send a close() on either interrupted open() or interrupted close()" so both comments as well as log messages are updates.

Additionally it adds logging of the MID that failed in the case of an interrupted Open() so that it is easy to find it in wireshark
and check whether that smb2 file handle was indeed handles by a SMB_Close() or not.


From testing it appears Pavels patch works. When the close() is interrupted we don't leak handles as far as I can tell.
We do have a leak in the Open() case though and it seems that eventhough we set things up and flags the MID to be cancelled we actually never end up
calling smb2_cancelled_close_fid() and thus we never send a SMB2_Close().
I haven't found the root cause yet but I suspect we mess up mid flags or state somewhere.


It did work in the past though when Sachin provided the initial implementation so we have regressed I think.
I have added a new test 'cifs/102'  to the buildbot that checks for this but have not integrated into the cifs-testing run yet since we still fail this test.
At least we will not have further regressions once we fix this and enable the test in the future.

ronnie s




----- Original Message -----
From: "Ronnie Sahlberg" <lsahlber@redhat.com>
To: "Pavel Shilovsky" <piastryyy@gmail.com>
Cc: "Frank Sorenson" <sorenson@redhat.com>, "linux-cifs" <linux-cifs@vger.kernel.org>
Sent: Wednesday, 13 November, 2019 3:19:11 PM
Subject: Re: A process killed while opening a file can result in leaked open handle on the server





----- Original Message -----
> From: "Pavel Shilovsky" <piastryyy@gmail.com>
> To: "Frank Sorenson" <sorenson@redhat.com>
> Cc: "linux-cifs" <linux-cifs@vger.kernel.org>
> Sent: Wednesday, 13 November, 2019 12:37:38 PM
> Subject: Re: A process killed while opening a file can result in leaked open handle on the server
> 
> вт, 12 нояб. 2019 г. в 11:34, Frank Sorenson <sorenson@redhat.com>:
> >
> > If a process opening a file is killed while waiting for a SMB2_CREATE
> > response from the server, the response may not be handled by the client,
> > leaking an open file handle on the server.
> >
> 
> Thanks for reporting the problem.
> 
> > This can be reproduced with the following:
> >
> > # mount //vm3/user1 /mnt/vm3
> > -overs=3,sec=ntlmssp,credentials=/root/.user1_smb_creds
> > # cd /mnt/vm3
> > # echo foo > foo
> >
> > # for i in {1..100} ; do cat foo >/dev/null 2>&1 & sleep 0.0001 ; kill -9
> > $! ; done
> >
> > (increase count if necessary--100 appears sufficient to cause multiple
> > leaked file handles)
> >
> 
> This is a known problem and the client handles it by closing unmatched
> opens (the one that don't have a corresponding waiting thread)
> immediately. It is indicated by the message you observed: "Close
> unmatched open".
> 
> >
> > The client will stop waiting for the response, and will output
> > the following when the response does arrive:
> >
> >     CIFS VFS: Close unmatched open
> >
> > on the server side, an open file handle is leaked.  If using samba,
> > the following will show these open files:
> >
> >
> > # smbstatus | grep -i Locked -A1000
> > Locked files:
> > Pid          Uid        DenyMode   Access      R/W        Oplock
> > SharePath   Name   Time
> > --------------------------------------------------------------------------------------------------
> > 25936        501        DENY_NONE  0x80        RDONLY     NONE
> > /home/user1   .   Tue Nov 12 12:29:24 2019
> > 25936        501        DENY_NONE  0x80        RDONLY     NONE
> > /home/user1   .   Tue Nov 12 12:29:24 2019
> > 25936        501        DENY_NONE  0x120089    RDONLY     LEASE(RWH)
> > /home/user1   foo   Tue Nov 12 12:29:24 2019
> > 25936        501        DENY_NONE  0x120089    RDONLY     LEASE(RWH)
> > /home/user1   foo   Tue Nov 12 12:29:24 2019
> > 25936        501        DENY_NONE  0x120089    RDONLY     LEASE(RWH)
> > /home/user1   foo   Tue Nov 12 12:29:24 2019
> > 25936        501        DENY_NONE  0x120089    RDONLY     LEASE(RWH)
> > /home/user1   foo   Tue Nov 12 12:29:24 2019
> > 25936        501        DENY_NONE  0x120089    RDONLY     LEASE(RWH)
> > /home/user1   foo   Tue Nov 12 12:29:24 2019
> >
> 
> I tried it locally myself, it seems that we have another issue related
> to interrupted close requests:
> 
> [1656476.740311] fs/cifs/file.c: Flush inode 0000000078729371 file
> 000000004d5f9348 rc 0
> [1656476.740313] fs/cifs/file.c: closing last open instance for inode
> 0000000078729371
> [1656476.740314] fs/cifs/file.c: CIFS VFS: in _cifsFileInfo_put as
> Xid: 457 with uid: 1000
> [1656476.740315] fs/cifs/smb2pdu.c: Close
> [1656476.740317] fs/cifs/transport.c: signal is pending before sending any
> data
> 
> This will return -512 error to the upper layer but we do not save a
> problematic handle somewhere to close it later. op->release() error
> code is not being checked by VFS anyway.
> 
> We should do the similar thing as we do today for cancelled mids:
> allocate "struct close_cancelled_open" and queue the lazy work to
> close the handle.
> 
> I attached the patch implementing this idea. The patch handles the
> interrupt errors in smb2_close_file which is called during close
> system call. It fixed the problem in my setup. In the same time I
> think I should probably move the handling to PDU layer function
> SMB2_close() to handle *all* interrupted close requests including ones
> closing temporary handles. I am wondering if anyone has thoughts about
> it. Anyway, review of the patch is highly appreciated.

I think your patch makes it better. I seem Close() that your patch now fixes the leak for.


But there is still a leak in Open().
I just got one instance where I leaked one handle.

The log shows :
CIFS VFS: \\192.168.124.207 Cancelling wait for mid 25 cmd: 5

and the wireshark shows an Open() request/response for this MID but we never
send a Close() for the handle.



> 
> @Frank, could you please test the patch in your environment?
> 
> --
> Best regards,
> Pavel Shilovsky
> 


[-- Attachment #2: patch --]
[-- Type: text/x-patch, Size: 3254 bytes --]

diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index d78bfcc19156..39a9165f9bba 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -1551,6 +1551,8 @@ struct close_cancelled_open {
 	struct cifs_fid         fid;
 	struct cifs_tcon        *tcon;
 	struct work_struct      work;
+	__u64 mid;
+	__u16 cmd;
 };
 
 /*	Make code in transport.c a little cleaner by moving
diff --git a/fs/cifs/smb2misc.c b/fs/cifs/smb2misc.c
index ba1e5754239c..1fb9932d3dae 100644
--- a/fs/cifs/smb2misc.c
+++ b/fs/cifs/smb2misc.c
@@ -735,19 +735,36 @@ smb2_cancelled_close_fid(struct work_struct *work)
 {
 	struct close_cancelled_open *cancelled = container_of(work,
 					struct close_cancelled_open, work);
+	struct cifs_tcon *tcon = cancelled->tcon;
+	int rc;
 
-	cifs_dbg(VFS, "Close unmatched open\n");
+	if (cancelled->mid)
+		cifs_tcon_dbg(VFS, "Close unmatched open for MID:%llx\n",
+			      cancelled->mid);
+	else
+		cifs_tcon_dbg(VFS, "Close interrupted close\n");
 
-	SMB2_close(0, cancelled->tcon, cancelled->fid.persistent_fid,
-		   cancelled->fid.volatile_fid);
-	cifs_put_tcon(cancelled->tcon);
+	rc = SMB2_close(0, tcon, cancelled->fid.persistent_fid,
+			cancelled->fid.volatile_fid);
+	if (rc) {
+		cifs_tcon_dbg(VFS, "Close cancelled mid failed with rc:%d\n", rc);
+	}
+
+	cifs_put_tcon(tcon);
 	kfree(cancelled);
 }
 
-/* Caller should already has an extra reference to @tcon */
+/*
+ * Caller should already has an extra reference to @tcon
+ * This function is used to queue work to close a handle to prevent leaks
+ * on the server.
+ * We handle two cases. If an open was interrupted after we sent the
+ * SMB2_CREATE to the server but before we processed the reply, and second
+ * if a close was interrupted before we sent the SMB2_CLOSE to the server.
+ */
 static int
-__smb2_handle_cancelled_close(struct cifs_tcon *tcon, __u64 persistent_fid,
-			      __u64 volatile_fid)
+__smb2_handle_cancelled_cmd(struct cifs_tcon *tcon, __u16 cmd, __u64 mid,
+			    __u64 persistent_fid, __u64 volatile_fid)
 {
 	struct close_cancelled_open *cancelled;
 
@@ -758,6 +775,8 @@ __smb2_handle_cancelled_close(struct cifs_tcon *tcon, __u64 persistent_fid,
 	cancelled->fid.persistent_fid = persistent_fid;
 	cancelled->fid.volatile_fid = volatile_fid;
 	cancelled->tcon = tcon;
+	cancelled->cmd = cmd;
+	cancelled->mid = mid;
 	INIT_WORK(&cancelled->work, smb2_cancelled_close_fid);
 	WARN_ON(queue_work(cifsiod_wq, &cancelled->work) == false);
 
@@ -775,7 +794,8 @@ smb2_handle_cancelled_close(struct cifs_tcon *tcon, __u64 persistent_fid,
 	tcon->tc_count++;
 	spin_unlock(&cifs_tcp_ses_lock);
 
-	rc = __smb2_handle_cancelled_close(tcon, persistent_fid, volatile_fid);
+	rc = __smb2_handle_cancelled_cmd(tcon, SMB2_CLOSE_HE, 0,
+					 persistent_fid, volatile_fid);
 	if (rc)
 		cifs_put_tcon(tcon);
 
@@ -799,8 +819,11 @@ smb2_handle_cancelled_mid(char *buffer, struct TCP_Server_Info *server)
 	if (!tcon)
 		return -ENOENT;
 
-	rc = __smb2_handle_cancelled_close(tcon, rsp->PersistentFileId,
-					   rsp->VolatileFileId);
+	rc = __smb2_handle_cancelled_cmd(tcon,
+					 le16_to_cpu(sync_hdr->Command),
+					 le64_to_cpu(sync_hdr->MessageId),
+					 rsp->PersistentFileId,
+					 rsp->VolatileFileId);
 	if (rc)
 		cifs_put_tcon(tcon);
 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: A process killed while opening a file can result in leaked open handle on the server
  2019-11-13  6:49     ` Ronnie Sahlberg
@ 2019-11-13 22:15       ` Frank Sorenson
  2019-11-14  1:39         ` Ronnie Sahlberg
  0 siblings, 1 reply; 10+ messages in thread
From: Frank Sorenson @ 2019-11-13 22:15 UTC (permalink / raw)
  To: Ronnie Sahlberg, Pavel Shilovsky; +Cc: linux-cifs

On 11/13/19 12:49 AM, Ronnie Sahlberg wrote:
> Steve, Pavel
> 
> This patch goes ontop of Pavels patch.
> Maybe it should be merged with Pavels patch since his patch changes from "we only send a close() on an interrupted open()"
> to now "we send a close() on either interrupted open() or interrupted close()" so both comments as well as log messages are updates.
> 
> Additionally it adds logging of the MID that failed in the case of an interrupted Open() so that it is easy to find it in wireshark
> and check whether that smb2 file handle was indeed handles by a SMB_Close() or not.
> 
> 
> From testing it appears Pavels patch works. When the close() is interrupted we don't leak handles as far as I can tell.
> We do have a leak in the Open() case though and it seems that eventhough we set things up and flags the MID to be cancelled we actually never end up
> calling smb2_cancelled_close_fid() and thus we never send a SMB2_Close().
> I haven't found the root cause yet but I suspect we mess up mid flags or state somewhere.
> 
> 
> It did work in the past though when Sachin provided the initial implementation so we have regressed I think.
> I have added a new test 'cifs/102'  to the buildbot that checks for this but have not integrated into the cifs-testing run yet since we still fail this test.
> At least we will not have further regressions once we fix this and enable the test in the future.
> 
> ronnie s

The patches do indeed improve it significantly.

I'm still seeing some leak as well, and I'm removing ratelimiting so
that I can see what the added debugging is trying to tell us.  I'll
report if I find more details.


Thanks for the help.


Frank


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: A process killed while opening a file can result in leaked open handle on the server
  2019-11-13 22:15       ` Frank Sorenson
@ 2019-11-14  1:39         ` Ronnie Sahlberg
  2019-11-15 18:21           ` Pavel Shilovskiy
  2019-11-17 16:29           ` Frank Sorenson
  0 siblings, 2 replies; 10+ messages in thread
From: Ronnie Sahlberg @ 2019-11-14  1:39 UTC (permalink / raw)
  To: Frank Sorenson; +Cc: Pavel Shilovsky, linux-cifs





----- Original Message -----
> From: "Frank Sorenson" <sorenson@redhat.com>
> To: "Ronnie Sahlberg" <lsahlber@redhat.com>, "Pavel Shilovsky" <piastryyy@gmail.com>
> Cc: "linux-cifs" <linux-cifs@vger.kernel.org>
> Sent: Thursday, 14 November, 2019 8:15:46 AM
> Subject: Re: A process killed while opening a file can result in leaked open handle on the server
> 
> On 11/13/19 12:49 AM, Ronnie Sahlberg wrote:
> > Steve, Pavel
> > 
> > This patch goes ontop of Pavels patch.
> > Maybe it should be merged with Pavels patch since his patch changes from
> > "we only send a close() on an interrupted open()"
> > to now "we send a close() on either interrupted open() or interrupted
> > close()" so both comments as well as log messages are updates.
> > 
> > Additionally it adds logging of the MID that failed in the case of an
> > interrupted Open() so that it is easy to find it in wireshark
> > and check whether that smb2 file handle was indeed handles by a SMB_Close()
> > or not.
> > 
> > 
> > From testing it appears Pavels patch works. When the close() is interrupted
> > we don't leak handles as far as I can tell.
> > We do have a leak in the Open() case though and it seems that eventhough we
> > set things up and flags the MID to be cancelled we actually never end up
> > calling smb2_cancelled_close_fid() and thus we never send a SMB2_Close().
> > I haven't found the root cause yet but I suspect we mess up mid flags or
> > state somewhere.
> > 
> > 
> > It did work in the past though when Sachin provided the initial
> > implementation so we have regressed I think.
> > I have added a new test 'cifs/102'  to the buildbot that checks for this
> > but have not integrated into the cifs-testing run yet since we still fail
> > this test.
> > At least we will not have further regressions once we fix this and enable
> > the test in the future.
> > 
> > ronnie s
> 
> The patches do indeed improve it significantly.
> 
> I'm still seeing some leak as well, and I'm removing ratelimiting so
> that I can see what the added debugging is trying to tell us.  I'll
> report if I find more details.
> 

We are making progress.
Can you test this patch if it improves even more for you?
It fixes most but not all the leaks I see for interrupted open():

I will post this to the list too as a separate mail/patch.


Author: Ronnie Sahlberg <lsahlber@redhat.com>
Date:   Thu Nov 14 11:23:06 2019 +1000

    cifs: fix race between compound_send_recv() and the demultiplex thread
    
    There is a race where the open() may be interrupted between when we receive the reply
    but before we have invoked the callback in which case we never end up calling
    handle_cancelled_mid() and thus leak an open handle on the server.
    
    Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>

diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index ccaa8bad336f..802604a7e692 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -1223,7 +1223,6 @@ cifs_demultiplex_thread(void *p)
                        if (mids[i] != NULL) {
                                mids[i]->resp_buf_size = server->pdu_size;
                                if ((mids[i]->mid_flags & MID_WAIT_CANCELLED) &&
-                                   mids[i]->mid_state == MID_RESPONSE_RECEIVED &&
                                    server->ops->handle_cancelled_mid)
                                        server->ops->handle_cancelled_mid(
                                                        mids[i]->resp_buf,
diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
index ca3de62688d6..28018a7eccb2 100644
--- a/fs/cifs/transport.c
+++ b/fs/cifs/transport.c
@@ -1119,7 +1119,8 @@ compound_send_recv(const unsigned int xid, struct cifs_ses *ses,
                                 midQ[i]->mid, le16_to_cpu(midQ[i]->command));
                        send_cancel(server, &rqst[i], midQ[i]);
                        spin_lock(&GlobalMid_Lock);
-                       if (midQ[i]->mid_state == MID_REQUEST_SUBMITTED) {
+                       if (midQ[i]->mid_state == MID_REQUEST_SUBMITTED ||
+                           midQ[i]->mid_state == MID_RESPONSE_RECEIVED) {
                                midQ[i]->mid_flags |= MID_WAIT_CANCELLED;
                                midQ[i]->callback = cifs_cancelled_callback;
                                cancelled_mid[i] = true;





> 
> Thanks for the help.
> 
> 
> Frank
> 
> 


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* RE: A process killed while opening a file can result in leaked open handle on the server
  2019-11-14  1:39         ` Ronnie Sahlberg
@ 2019-11-15 18:21           ` Pavel Shilovskiy
  2019-11-17 16:29           ` Frank Sorenson
  1 sibling, 0 replies; 10+ messages in thread
From: Pavel Shilovskiy @ 2019-11-15 18:21 UTC (permalink / raw)
  To: Ronnie Sahlberg, Frank Sorenson, Steven French
  Cc: Pavel Shilovsky, linux-cifs

ср, 13 нояб. 2019 г. в 17:39, Ronnie Sahlberg <lsahlber@redhat.com>:
>
>
>
>
>
> ----- Original Message -----
> > From: "Frank Sorenson" <sorenson@redhat.com>
> > To: "Ronnie Sahlberg" <lsahlber@redhat.com>, "Pavel Shilovsky" <piastryyy@gmail.com>
> > Cc: "linux-cifs" <linux-cifs@vger.kernel.org>
> > Sent: Thursday, 14 November, 2019 8:15:46 AM
> > Subject: Re: A process killed while opening a file can result in leaked open handle on the server
> >
> > On 11/13/19 12:49 AM, Ronnie Sahlberg wrote:
> > > Steve, Pavel
> > >
> > > This patch goes ontop of Pavels patch.
> > > Maybe it should be merged with Pavels patch since his patch changes from
> > > "we only send a close() on an interrupted open()"
> > > to now "we send a close() on either interrupted open() or interrupted
> > > close()" so both comments as well as log messages are updates.
> > >
> > > Additionally it adds logging of the MID that failed in the case of an
> > > interrupted Open() so that it is easy to find it in wireshark
> > > and check whether that smb2 file handle was indeed handles by a SMB_Close()
> > > or not.
> > >
> > >
> > > From testing it appears Pavels patch works. When the close() is interrupted
> > > we don't leak handles as far as I can tell.
> > > We do have a leak in the Open() case though and it seems that eventhough we
> > > set things up and flags the MID to be cancelled we actually never end up
> > > calling smb2_cancelled_close_fid() and thus we never send a SMB2_Close().
> > > I haven't found the root cause yet but I suspect we mess up mid flags or
> > > state somewhere.
> > >
> > >
> > > It did work in the past though when Sachin provided the initial
> > > implementation so we have regressed I think.
> > > I have added a new test 'cifs/102'  to the buildbot that checks for this
> > > but have not integrated into the cifs-testing run yet since we still fail
> > > this test.
> > > At least we will not have further regressions once we fix this and enable
> > > the test in the future.
> > >
> > > ronnie s
> >
> > The patches do indeed improve it significantly.
> >
> > I'm still seeing some leak as well, and I'm removing ratelimiting so
> > that I can see what the added debugging is trying to tell us.  I'll
> > report if I find more details.
> >
>
> We are making progress.
> Can you test this patch if it improves even more for you?
> It fixes most but not all the leaks I see for interrupted open():
>
> I will post this to the list too as a separate mail/patch.
>
>
> Author: Ronnie Sahlberg <lsahlber@redhat.com>
> Date:   Thu Nov 14 11:23:06 2019 +1000
>
>     cifs: fix race between compound_send_recv() and the demultiplex thread
>
>     There is a race where the open() may be interrupted between when we receive the reply
>     but before we have invoked the callback in which case we never end up calling
>     handle_cancelled_mid() and thus leak an open handle on the server.
>
>     Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
>
> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> index ccaa8bad336f..802604a7e692 100644
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -1223,7 +1223,6 @@ cifs_demultiplex_thread(void *p)
>                         if (mids[i] != NULL) {
>                                 mids[i]->resp_buf_size = server->pdu_size;
>                                 if ((mids[i]->mid_flags & MID_WAIT_CANCELLED) &&
> -                                   mids[i]->mid_state == MID_RESPONSE_RECEIVED &&
>                                     server->ops->handle_cancelled_mid)
>                                         server->ops->handle_cancelled_mid(
>                                                         mids[i]->resp_buf,
> diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
> index ca3de62688d6..28018a7eccb2 100644
> --- a/fs/cifs/transport.c
> +++ b/fs/cifs/transport.c
> @@ -1119,7 +1119,8 @@ compound_send_recv(const unsigned int xid, struct cifs_ses *ses,
>                                  midQ[i]->mid, le16_to_cpu(midQ[i]->command));
>                         send_cancel(server, &rqst[i], midQ[i]);
>                         spin_lock(&GlobalMid_Lock);
> -                       if (midQ[i]->mid_state == MID_REQUEST_SUBMITTED) {
> +                       if (midQ[i]->mid_state == MID_REQUEST_SUBMITTED ||
> +                           midQ[i]->mid_state == MID_RESPONSE_RECEIVED) {
>                                 midQ[i]->mid_flags |= MID_WAIT_CANCELLED;
>                                 midQ[i]->callback = cifs_cancelled_callback;
>                                 cancelled_mid[i] = true;
>
>
>
>
>
> >
> > Thanks for the help.
> >
> >
> > Frank
> >
> >
>

Thanks Frank, Ronnie for testing my patch! I also like the Ronnie's patch - good progress overall!

About my patch, I am going to add a proper description once I get time next week and will probably do some improvements:

1) Currently with my patch the client will send Close even if it has already received a response but wait_for_response() was interrupted. The latter function returns -ERESTARTSYS. What we want to do instead is to send Close only if *sending* was interrupted. In that case we return -EINTR, see __smb_send_rqst():

if (signal_pending(current)) {
    cifs_dbg(FYI, "signal is pending before sending any data\n");
    return -EINTR;
}

2) Move the error handling to the PDU layer function SMB2_close() to handle *all* interrupted close requests including ones closing temporary handles.

Please comment on the two changes above!

--
Best regards,
Pavel Shilovsky

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: A process killed while opening a file can result in leaked open handle on the server
  2019-11-14  1:39         ` Ronnie Sahlberg
  2019-11-15 18:21           ` Pavel Shilovskiy
@ 2019-11-17 16:29           ` Frank Sorenson
  2019-11-21 19:41             ` Pavel Shilovsky
  1 sibling, 1 reply; 10+ messages in thread
From: Frank Sorenson @ 2019-11-17 16:29 UTC (permalink / raw)
  To: Ronnie Sahlberg; +Cc: Pavel Shilovsky, linux-cifs

On 11/13/19 7:39 PM, Ronnie Sahlberg wrote:
> ----- Original Message -----
>> From: "Frank Sorenson" <sorenson@redhat.com>
>> To: "Ronnie Sahlberg" <lsahlber@redhat.com>, "Pavel Shilovsky" <piastryyy@gmail.com>
>> Cc: "linux-cifs" <linux-cifs@vger.kernel.org>
>> Sent: Thursday, 14 November, 2019 8:15:46 AM
>> Subject: Re: A process killed while opening a file can result in leaked open handle on the server
>>
>> On 11/13/19 12:49 AM, Ronnie Sahlberg wrote:
>>> Steve, Pavel
>>>
>>> This patch goes ontop of Pavels patch.
>>> Maybe it should be merged with Pavels patch since his patch changes from
>>> "we only send a close() on an interrupted open()"
>>> to now "we send a close() on either interrupted open() or interrupted
>>> close()" so both comments as well as log messages are updates.
>>>
>>> Additionally it adds logging of the MID that failed in the case of an
>>> interrupted Open() so that it is easy to find it in wireshark
>>> and check whether that smb2 file handle was indeed handles by a SMB_Close()
>>> or not.
>>>
>>>
>>> From testing it appears Pavels patch works. When the close() is interrupted
>>> we don't leak handles as far as I can tell.
>>> We do have a leak in the Open() case though and it seems that eventhough we
>>> set things up and flags the MID to be cancelled we actually never end up
>>> calling smb2_cancelled_close_fid() and thus we never send a SMB2_Close().
>>> I haven't found the root cause yet but I suspect we mess up mid flags or
>>> state somewhere.
>>>
>>>
>>> It did work in the past though when Sachin provided the initial
>>> implementation so we have regressed I think.
>>> I have added a new test 'cifs/102'  to the buildbot that checks for this
>>> but have not integrated into the cifs-testing run yet since we still fail
>>> this test.
>>> At least we will not have further regressions once we fix this and enable
>>> the test in the future.
>>>
>>> ronnie s
>>
>> The patches do indeed improve it significantly.
>>
>> I'm still seeing some leak as well, and I'm removing ratelimiting so
>> that I can see what the added debugging is trying to tell us.  I'll
>> report if I find more details.

> We are making progress.

Agreed.  We're definitely making progress.

> Can you test this patch if it improves even more for you?
> It fixes most but not all the leaks I see for interrupted open():
> 
> I will post this to the list too as a separate mail/patch.


Sorry to be slow on the testing.


I might be seeing some small improvement with this one, but I'm still seeing some mismatches:

# for i in {1..100} ; do cat /mnt/vm3/foo.$i >/dev/null 2>&1 & sleep 0.0001 ; kill -9 $! ; done
...

This ended up with 2 open on the server side:

21842        501        DENY_NONE  0x120089    RDONLY     LEASE(RWH)       /home/user1   foo.32   Sun Nov 17 09:13:38 2019
21842        501        DENY_NONE  0x120089    RDONLY     LEASE(RWH)       /home/user1   foo.48   Sun Nov 17 09:13:38 2019


the packet capture shows the same mismatch pattern for these two:

102  Create Request File: foo.32;GetInfo Request FILE_INFO/SMB2_FILE_ALL_INFO;Close Request 
103  Create Response File: foo.32;GetInfo Response;Close Response 
104  Create Request File: foo.32 
105  Create Response File: foo.32 


148  Create Request File: foo.48;GetInfo Request FILE_INFO/SMB2_FILE_ALL_INFO;Close Request 
149  Create Response File: foo.48;GetInfo Response;Close Response 
150  Create Request File: foo.48 
151  Create Response File: foo.48 

with no close for those two.


the messages are also similar, and show transmitting the second open request and cancelling the wait immediately afterward:
[9006] cifs:cifs_lookup:669: fs/cifs/dir.c: CIFS VFS: in cifs_lookup as Xid: 1091 with uid: 0
[9006] cifs:cifs_lookup:672: fs/cifs/dir.c: parent inode = 0x000000008f9424fe name is: foo.32 and dentry = 0x0000000053e436bf
[9006] cifs:build_path_from_dentry_optional_prefix:143: fs/cifs/dir.c: name: \foo.32
[9006] cifs:cifs_lookup:704: fs/cifs/dir.c: NULL inode in lookup
[9006] cifs:cifs_lookup:707: fs/cifs/dir.c: Full path: \foo.32 inode = 0x000000004667ea0b
[9006] cifs:cifs_get_inode_info:753: fs/cifs/inode.c: Getting info on \foo.32
[9006] cifs:__smb_send_rqst:368: fs/cifs/transport.c: Sending smb: smb_len=356
[9006] cifs:cifs_sync_mid_result:859: fs/cifs/transport.c: cifs_sync_mid_result: cmd=5 mid=113 state=4
[9006] cifs:cifs_sync_mid_result:859: fs/cifs/transport.c: cifs_sync_mid_result: cmd=16 mid=114 state=4
[9006] cifs:cifs_sync_mid_result:859: fs/cifs/transport.c: cifs_sync_mid_result: cmd=6 mid=115 state=4
[9006] cifs:cifs_small_buf_release:222: fs/cifs/misc.c: Null buffer passed to cifs_small_buf_release
[9006] cifs:cifs_small_buf_release:222: fs/cifs/misc.c: Null buffer passed to cifs_small_buf_release
[9006] cifs:cifs_small_buf_release:222: fs/cifs/misc.c: Null buffer passed to cifs_small_buf_release
[9006] cifs:cifs_iget:1030: fs/cifs/inode.c: looking for uniqueid=5934909
[9006] cifs:cifs_revalidate_cache:100: fs/cifs/inode.c: cifs_revalidate_cache: revalidating inode 5934909
[9006] cifs:cifs_revalidate_cache:124: fs/cifs/inode.c: cifs_revalidate_cache: invalidating inode 5934909 mapping
[9006] cifs:cifs_lookup:734: fs/cifs/dir.c: CIFS VFS: leaving cifs_lookup (xid = 1091) rc = 0
[9006] cifs:cifs_open:512: fs/cifs/file.c: CIFS VFS: in cifs_open as Xid: 1092 with uid: 0
[9006] cifs:build_path_from_dentry_optional_prefix:143: fs/cifs/dir.c: name: \foo.32
[9006] cifs:cifs_open:530: fs/cifs/file.c: inode = 0x000000001a16a2ae file flags are 0x8000 for \foo.32
[9006] cifs:SMB2_open:2581: fs/cifs/smb2pdu.c: create/open
[9006] cifs:__smb_send_rqst:368: fs/cifs/transport.c: Sending smb: smb_len=284
CIFS VFS: \\vm3 Cancelling wait for mid 116 cmd: 5
[9006] cifs:cifs_open:618: fs/cifs/file.c: CIFS VFS: leaving cifs_open (xid = 1092) rc = -512



[9039] cifs:cifs_lookup:669: fs/cifs/dir.c: CIFS VFS: in cifs_lookup as Xid: 1109 with uid: 0
[9039] cifs:cifs_lookup:672: fs/cifs/dir.c: parent inode = 0x000000008f9424fe name is: foo.48 and dentry = 0x0000000040aea0d9
[9039] cifs:build_path_from_dentry_optional_prefix:143: fs/cifs/dir.c: name: \foo.48
[9039] cifs:cifs_lookup:704: fs/cifs/dir.c: NULL inode in lookup
[9039] cifs:cifs_lookup:707: fs/cifs/dir.c: Full path: \foo.48 inode = 0x000000004667ea0b
[9039] cifs:cifs_get_inode_info:753: fs/cifs/inode.c: Getting info on \foo.48
[9039] cifs:__smb_send_rqst:368: fs/cifs/transport.c: Sending smb: smb_len=356
[9039] cifs:cifs_sync_mid_result:859: fs/cifs/transport.c: cifs_sync_mid_result: cmd=5 mid=158 state=4
[9039] cifs:cifs_sync_mid_result:859: fs/cifs/transport.c: cifs_sync_mid_result: cmd=16 mid=159 state=4
[9039] cifs:cifs_sync_mid_result:859: fs/cifs/transport.c: cifs_sync_mid_result: cmd=6 mid=160 state=4
[9039] cifs:cifs_small_buf_release:222: fs/cifs/misc.c: Null buffer passed to cifs_small_buf_release
[9039] cifs:cifs_small_buf_release:222: fs/cifs/misc.c: Null buffer passed to cifs_small_buf_release
[9039] cifs:cifs_small_buf_release:222: fs/cifs/misc.c: Null buffer passed to cifs_small_buf_release
[9039] cifs:cifs_iget:1030: fs/cifs/inode.c: looking for uniqueid=21857488
[9039] cifs:cifs_revalidate_cache:100: fs/cifs/inode.c: cifs_revalidate_cache: revalidating inode 21857488
[9039] cifs:cifs_revalidate_cache:124: fs/cifs/inode.c: cifs_revalidate_cache: invalidating inode 21857488 mapping
[9039] cifs:cifs_lookup:734: fs/cifs/dir.c: CIFS VFS: leaving cifs_lookup (xid = 1109) rc = 0
[9039] cifs:cifs_open:512: fs/cifs/file.c: CIFS VFS: in cifs_open as Xid: 1110 with uid: 0
[9039] cifs:build_path_from_dentry_optional_prefix:143: fs/cifs/dir.c: name: \foo.48
[9039] cifs:cifs_open:530: fs/cifs/file.c: inode = 0x00000000001a4f79 file flags are 0x8000 for \foo.48
[9039] cifs:SMB2_open:2581: fs/cifs/smb2pdu.c: create/open
[9039] cifs:__smb_send_rqst:368: fs/cifs/transport.c: Sending smb: smb_len=284
CIFS VFS: \\vm3 Cancelling wait for mid 161 cmd: 5
[9039] cifs:cifs_open:618: fs/cifs/file.c: CIFS VFS: leaving cifs_open (xid = 1110) rc = -512


Frank


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: A process killed while opening a file can result in leaked open handle on the server
  2019-11-17 16:29           ` Frank Sorenson
@ 2019-11-21 19:41             ` Pavel Shilovsky
  0 siblings, 0 replies; 10+ messages in thread
From: Pavel Shilovsky @ 2019-11-21 19:41 UTC (permalink / raw)
  To: Frank Sorenson; +Cc: Ronnie Sahlberg, linux-cifs

Hi Frank,

I have just posted 3 patches to the list to address remaining handle leaks:

CIFS: Close open handle after interrupted close (2nd version of my
original patch to handle interrupted closes)
CIFS: Fix NULL pointer dereference in mid callback (not a handle leak
fix but the kernel kept crashing on my system during intensive testing
for handle leaks)
CIFS: Do not miss cancelled OPEN responses (fix to handle some
unmatched opens that are not processed right now due to races)

Could you please test them in your environment? I ran a script similar
to what you use for repro (just more iterations) and Samba didn't show
leaked handles afterwards.

--
Best regards,
Pavel Shilovsky

вс, 17 нояб. 2019 г. в 08:29, Frank Sorenson <sorenson@redhat.com>:
>
> On 11/13/19 7:39 PM, Ronnie Sahlberg wrote:
> > ----- Original Message -----
> >> From: "Frank Sorenson" <sorenson@redhat.com>
> >> To: "Ronnie Sahlberg" <lsahlber@redhat.com>, "Pavel Shilovsky" <piastryyy@gmail.com>
> >> Cc: "linux-cifs" <linux-cifs@vger.kernel.org>
> >> Sent: Thursday, 14 November, 2019 8:15:46 AM
> >> Subject: Re: A process killed while opening a file can result in leaked open handle on the server
> >>
> >> On 11/13/19 12:49 AM, Ronnie Sahlberg wrote:
> >>> Steve, Pavel
> >>>
> >>> This patch goes ontop of Pavels patch.
> >>> Maybe it should be merged with Pavels patch since his patch changes from
> >>> "we only send a close() on an interrupted open()"
> >>> to now "we send a close() on either interrupted open() or interrupted
> >>> close()" so both comments as well as log messages are updates.
> >>>
> >>> Additionally it adds logging of the MID that failed in the case of an
> >>> interrupted Open() so that it is easy to find it in wireshark
> >>> and check whether that smb2 file handle was indeed handles by a SMB_Close()
> >>> or not.
> >>>
> >>>
> >>> From testing it appears Pavels patch works. When the close() is interrupted
> >>> we don't leak handles as far as I can tell.
> >>> We do have a leak in the Open() case though and it seems that eventhough we
> >>> set things up and flags the MID to be cancelled we actually never end up
> >>> calling smb2_cancelled_close_fid() and thus we never send a SMB2_Close().
> >>> I haven't found the root cause yet but I suspect we mess up mid flags or
> >>> state somewhere.
> >>>
> >>>
> >>> It did work in the past though when Sachin provided the initial
> >>> implementation so we have regressed I think.
> >>> I have added a new test 'cifs/102'  to the buildbot that checks for this
> >>> but have not integrated into the cifs-testing run yet since we still fail
> >>> this test.
> >>> At least we will not have further regressions once we fix this and enable
> >>> the test in the future.
> >>>
> >>> ronnie s
> >>
> >> The patches do indeed improve it significantly.
> >>
> >> I'm still seeing some leak as well, and I'm removing ratelimiting so
> >> that I can see what the added debugging is trying to tell us.  I'll
> >> report if I find more details.
>
> > We are making progress.
>
> Agreed.  We're definitely making progress.
>
> > Can you test this patch if it improves even more for you?
> > It fixes most but not all the leaks I see for interrupted open():
> >
> > I will post this to the list too as a separate mail/patch.
>
>
> Sorry to be slow on the testing.
>
>
> I might be seeing some small improvement with this one, but I'm still seeing some mismatches:
>
> # for i in {1..100} ; do cat /mnt/vm3/foo.$i >/dev/null 2>&1 & sleep 0.0001 ; kill -9 $! ; done
> ...
>
> This ended up with 2 open on the server side:
>
> 21842        501        DENY_NONE  0x120089    RDONLY     LEASE(RWH)       /home/user1   foo.32   Sun Nov 17 09:13:38 2019
> 21842        501        DENY_NONE  0x120089    RDONLY     LEASE(RWH)       /home/user1   foo.48   Sun Nov 17 09:13:38 2019
>
>
> the packet capture shows the same mismatch pattern for these two:
>
> 102  Create Request File: foo.32;GetInfo Request FILE_INFO/SMB2_FILE_ALL_INFO;Close Request
> 103  Create Response File: foo.32;GetInfo Response;Close Response
> 104  Create Request File: foo.32
> 105  Create Response File: foo.32
>
>
> 148  Create Request File: foo.48;GetInfo Request FILE_INFO/SMB2_FILE_ALL_INFO;Close Request
> 149  Create Response File: foo.48;GetInfo Response;Close Response
> 150  Create Request File: foo.48
> 151  Create Response File: foo.48
>
> with no close for those two.
>
>
> the messages are also similar, and show transmitting the second open request and cancelling the wait immediately afterward:
> [9006] cifs:cifs_lookup:669: fs/cifs/dir.c: CIFS VFS: in cifs_lookup as Xid: 1091 with uid: 0
> [9006] cifs:cifs_lookup:672: fs/cifs/dir.c: parent inode = 0x000000008f9424fe name is: foo.32 and dentry = 0x0000000053e436bf
> [9006] cifs:build_path_from_dentry_optional_prefix:143: fs/cifs/dir.c: name: \foo.32
> [9006] cifs:cifs_lookup:704: fs/cifs/dir.c: NULL inode in lookup
> [9006] cifs:cifs_lookup:707: fs/cifs/dir.c: Full path: \foo.32 inode = 0x000000004667ea0b
> [9006] cifs:cifs_get_inode_info:753: fs/cifs/inode.c: Getting info on \foo.32
> [9006] cifs:__smb_send_rqst:368: fs/cifs/transport.c: Sending smb: smb_len=356
> [9006] cifs:cifs_sync_mid_result:859: fs/cifs/transport.c: cifs_sync_mid_result: cmd=5 mid=113 state=4
> [9006] cifs:cifs_sync_mid_result:859: fs/cifs/transport.c: cifs_sync_mid_result: cmd=16 mid=114 state=4
> [9006] cifs:cifs_sync_mid_result:859: fs/cifs/transport.c: cifs_sync_mid_result: cmd=6 mid=115 state=4
> [9006] cifs:cifs_small_buf_release:222: fs/cifs/misc.c: Null buffer passed to cifs_small_buf_release
> [9006] cifs:cifs_small_buf_release:222: fs/cifs/misc.c: Null buffer passed to cifs_small_buf_release
> [9006] cifs:cifs_small_buf_release:222: fs/cifs/misc.c: Null buffer passed to cifs_small_buf_release
> [9006] cifs:cifs_iget:1030: fs/cifs/inode.c: looking for uniqueid=5934909
> [9006] cifs:cifs_revalidate_cache:100: fs/cifs/inode.c: cifs_revalidate_cache: revalidating inode 5934909
> [9006] cifs:cifs_revalidate_cache:124: fs/cifs/inode.c: cifs_revalidate_cache: invalidating inode 5934909 mapping
> [9006] cifs:cifs_lookup:734: fs/cifs/dir.c: CIFS VFS: leaving cifs_lookup (xid = 1091) rc = 0
> [9006] cifs:cifs_open:512: fs/cifs/file.c: CIFS VFS: in cifs_open as Xid: 1092 with uid: 0
> [9006] cifs:build_path_from_dentry_optional_prefix:143: fs/cifs/dir.c: name: \foo.32
> [9006] cifs:cifs_open:530: fs/cifs/file.c: inode = 0x000000001a16a2ae file flags are 0x8000 for \foo.32
> [9006] cifs:SMB2_open:2581: fs/cifs/smb2pdu.c: create/open
> [9006] cifs:__smb_send_rqst:368: fs/cifs/transport.c: Sending smb: smb_len=284
> CIFS VFS: \\vm3 Cancelling wait for mid 116 cmd: 5
> [9006] cifs:cifs_open:618: fs/cifs/file.c: CIFS VFS: leaving cifs_open (xid = 1092) rc = -512
>
>
>
> [9039] cifs:cifs_lookup:669: fs/cifs/dir.c: CIFS VFS: in cifs_lookup as Xid: 1109 with uid: 0
> [9039] cifs:cifs_lookup:672: fs/cifs/dir.c: parent inode = 0x000000008f9424fe name is: foo.48 and dentry = 0x0000000040aea0d9
> [9039] cifs:build_path_from_dentry_optional_prefix:143: fs/cifs/dir.c: name: \foo.48
> [9039] cifs:cifs_lookup:704: fs/cifs/dir.c: NULL inode in lookup
> [9039] cifs:cifs_lookup:707: fs/cifs/dir.c: Full path: \foo.48 inode = 0x000000004667ea0b
> [9039] cifs:cifs_get_inode_info:753: fs/cifs/inode.c: Getting info on \foo.48
> [9039] cifs:__smb_send_rqst:368: fs/cifs/transport.c: Sending smb: smb_len=356
> [9039] cifs:cifs_sync_mid_result:859: fs/cifs/transport.c: cifs_sync_mid_result: cmd=5 mid=158 state=4
> [9039] cifs:cifs_sync_mid_result:859: fs/cifs/transport.c: cifs_sync_mid_result: cmd=16 mid=159 state=4
> [9039] cifs:cifs_sync_mid_result:859: fs/cifs/transport.c: cifs_sync_mid_result: cmd=6 mid=160 state=4
> [9039] cifs:cifs_small_buf_release:222: fs/cifs/misc.c: Null buffer passed to cifs_small_buf_release
> [9039] cifs:cifs_small_buf_release:222: fs/cifs/misc.c: Null buffer passed to cifs_small_buf_release
> [9039] cifs:cifs_small_buf_release:222: fs/cifs/misc.c: Null buffer passed to cifs_small_buf_release
> [9039] cifs:cifs_iget:1030: fs/cifs/inode.c: looking for uniqueid=21857488
> [9039] cifs:cifs_revalidate_cache:100: fs/cifs/inode.c: cifs_revalidate_cache: revalidating inode 21857488
> [9039] cifs:cifs_revalidate_cache:124: fs/cifs/inode.c: cifs_revalidate_cache: invalidating inode 21857488 mapping
> [9039] cifs:cifs_lookup:734: fs/cifs/dir.c: CIFS VFS: leaving cifs_lookup (xid = 1109) rc = 0
> [9039] cifs:cifs_open:512: fs/cifs/file.c: CIFS VFS: in cifs_open as Xid: 1110 with uid: 0
> [9039] cifs:build_path_from_dentry_optional_prefix:143: fs/cifs/dir.c: name: \foo.48
> [9039] cifs:cifs_open:530: fs/cifs/file.c: inode = 0x00000000001a4f79 file flags are 0x8000 for \foo.48
> [9039] cifs:SMB2_open:2581: fs/cifs/smb2pdu.c: create/open
> [9039] cifs:__smb_send_rqst:368: fs/cifs/transport.c: Sending smb: smb_len=284
> CIFS VFS: \\vm3 Cancelling wait for mid 161 cmd: 5
> [9039] cifs:cifs_open:618: fs/cifs/file.c: CIFS VFS: leaving cifs_open (xid = 1110) rc = -512
>
>
> Frank
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-11-21 19:41 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-12 19:34 A process killed while opening a file can result in leaked open handle on the server Frank Sorenson
2019-11-13  2:37 ` Pavel Shilovsky
2019-11-13  5:19   ` Ronnie Sahlberg
2019-11-13  6:49     ` Ronnie Sahlberg
2019-11-13 22:15       ` Frank Sorenson
2019-11-14  1:39         ` Ronnie Sahlberg
2019-11-15 18:21           ` Pavel Shilovskiy
2019-11-17 16:29           ` Frank Sorenson
2019-11-21 19:41             ` Pavel Shilovsky
2019-11-13  3:28 ` Ronnie Sahlberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).