* A process killed while opening a file can result in leaked open handle on the server @ 2019-11-12 19:34 Frank Sorenson 2019-11-13 2:37 ` Pavel Shilovsky 2019-11-13 3:28 ` Ronnie Sahlberg 0 siblings, 2 replies; 10+ messages in thread From: Frank Sorenson @ 2019-11-12 19:34 UTC (permalink / raw) To: linux-cifs If a process opening a file is killed while waiting for a SMB2_CREATE response from the server, the response may not be handled by the client, leaking an open file handle on the server. This can be reproduced with the following: # mount //vm3/user1 /mnt/vm3 -overs=3,sec=ntlmssp,credentials=/root/.user1_smb_creds # cd /mnt/vm3 # echo foo > foo # for i in {1..100} ; do cat foo >/dev/null 2>&1 & sleep 0.0001 ; kill -9 $! ; done (increase count if necessary--100 appears sufficient to cause multiple leaked file handles) The client will stop waiting for the response, and will output the following when the response does arrive: CIFS VFS: Close unmatched open on the server side, an open file handle is leaked. If using samba, the following will show these open files: # smbstatus | grep -i Locked -A1000 Locked files: Pid Uid DenyMode Access R/W Oplock SharePath Name Time -------------------------------------------------------------------------------------------------- 25936 501 DENY_NONE 0x80 RDONLY NONE /home/user1 . Tue Nov 12 12:29:24 2019 25936 501 DENY_NONE 0x80 RDONLY NONE /home/user1 . Tue Nov 12 12:29:24 2019 25936 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) /home/user1 foo Tue Nov 12 12:29:24 2019 25936 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) /home/user1 foo Tue Nov 12 12:29:24 2019 25936 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) /home/user1 foo Tue Nov 12 12:29:24 2019 25936 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) /home/user1 foo Tue Nov 12 12:29:24 2019 25936 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) /home/user1 foo Tue Nov 12 12:29:24 2019 (also tracked https://bugzilla.redhat.com/show_bug.cgi?id=1771691) Frank -- Frank Sorenson sorenson@redhat.com Principal Software Maintenance Engineer Global Support Services - filesystems Red Hat ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: A process killed while opening a file can result in leaked open handle on the server 2019-11-12 19:34 A process killed while opening a file can result in leaked open handle on the server Frank Sorenson @ 2019-11-13 2:37 ` Pavel Shilovsky 2019-11-13 5:19 ` Ronnie Sahlberg 2019-11-13 3:28 ` Ronnie Sahlberg 1 sibling, 1 reply; 10+ messages in thread From: Pavel Shilovsky @ 2019-11-13 2:37 UTC (permalink / raw) To: Frank Sorenson; +Cc: linux-cifs [-- Attachment #1: Type: text/plain, Size: 3721 bytes --] вт, 12 нояб. 2019 г. в 11:34, Frank Sorenson <sorenson@redhat.com>: > > If a process opening a file is killed while waiting for a SMB2_CREATE > response from the server, the response may not be handled by the client, > leaking an open file handle on the server. > Thanks for reporting the problem. > This can be reproduced with the following: > > # mount //vm3/user1 /mnt/vm3 -overs=3,sec=ntlmssp,credentials=/root/.user1_smb_creds > # cd /mnt/vm3 > # echo foo > foo > > # for i in {1..100} ; do cat foo >/dev/null 2>&1 & sleep 0.0001 ; kill -9 $! ; done > > (increase count if necessary--100 appears sufficient to cause multiple leaked file handles) > This is a known problem and the client handles it by closing unmatched opens (the one that don't have a corresponding waiting thread) immediately. It is indicated by the message you observed: "Close unmatched open". > > The client will stop waiting for the response, and will output > the following when the response does arrive: > > CIFS VFS: Close unmatched open > > on the server side, an open file handle is leaked. If using samba, > the following will show these open files: > > > # smbstatus | grep -i Locked -A1000 > Locked files: > Pid Uid DenyMode Access R/W Oplock SharePath Name Time > -------------------------------------------------------------------------------------------------- > 25936 501 DENY_NONE 0x80 RDONLY NONE /home/user1 . Tue Nov 12 12:29:24 2019 > 25936 501 DENY_NONE 0x80 RDONLY NONE /home/user1 . Tue Nov 12 12:29:24 2019 > 25936 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) /home/user1 foo Tue Nov 12 12:29:24 2019 > 25936 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) /home/user1 foo Tue Nov 12 12:29:24 2019 > 25936 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) /home/user1 foo Tue Nov 12 12:29:24 2019 > 25936 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) /home/user1 foo Tue Nov 12 12:29:24 2019 > 25936 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) /home/user1 foo Tue Nov 12 12:29:24 2019 > I tried it locally myself, it seems that we have another issue related to interrupted close requests: [1656476.740311] fs/cifs/file.c: Flush inode 0000000078729371 file 000000004d5f9348 rc 0 [1656476.740313] fs/cifs/file.c: closing last open instance for inode 0000000078729371 [1656476.740314] fs/cifs/file.c: CIFS VFS: in _cifsFileInfo_put as Xid: 457 with uid: 1000 [1656476.740315] fs/cifs/smb2pdu.c: Close [1656476.740317] fs/cifs/transport.c: signal is pending before sending any data This will return -512 error to the upper layer but we do not save a problematic handle somewhere to close it later. op->release() error code is not being checked by VFS anyway. We should do the similar thing as we do today for cancelled mids: allocate "struct close_cancelled_open" and queue the lazy work to close the handle. I attached the patch implementing this idea. The patch handles the interrupt errors in smb2_close_file which is called during close system call. It fixed the problem in my setup. In the same time I think I should probably move the handling to PDU layer function SMB2_close() to handle *all* interrupted close requests including ones closing temporary handles. I am wondering if anyone has thoughts about it. Anyway, review of the patch is highly appreciated. @Frank, could you please test the patch in your environment? -- Best regards, Pavel Shilovsky [-- Attachment #2: 0001-CIFS-Close-open-handle-after-interrupted-close.patch --] [-- Type: application/octet-stream, Size: 4120 bytes --] From 7421925b2e7ee0d47eebfb291268784078ea8812 Mon Sep 17 00:00:00 2001 From: Pavel Shilovsky <pshilov@microsoft.com> Date: Tue, 12 Nov 2019 16:38:13 -0800 Subject: [PATCH] CIFS: Close open handle after interrupted close Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> --- fs/cifs/smb2misc.c | 59 ++++++++++++++++++++++++++++++++++----------- fs/cifs/smb2ops.c | 11 ++++++++- fs/cifs/smb2proto.h | 3 +++ 3 files changed, 58 insertions(+), 15 deletions(-) diff --git a/fs/cifs/smb2misc.c b/fs/cifs/smb2misc.c index 527c9efd3de0..80a8f4b2daab 100644 --- a/fs/cifs/smb2misc.c +++ b/fs/cifs/smb2misc.c @@ -725,36 +725,67 @@ smb2_cancelled_close_fid(struct work_struct *work) kfree(cancelled); } +/* Caller should already has an extra reference to @tcon */ +static int +__smb2_handle_cancelled_close(struct cifs_tcon *tcon, __u64 persistent_fid, + __u64 volatile_fid) +{ + struct close_cancelled_open *cancelled; + + cancelled = kzalloc(sizeof(*cancelled), GFP_KERNEL); + if (!cancelled) + return -ENOMEM; + + cancelled->fid.persistent_fid = persistent_fid; + cancelled->fid.volatile_fid = volatile_fid; + cancelled->tcon = tcon; + INIT_WORK(&cancelled->work, smb2_cancelled_close_fid); + WARN_ON(queue_work(cifsiod_wq, &cancelled->work) == false); + + return 0; +} + +int +smb2_handle_cancelled_close(struct cifs_tcon *tcon, __u64 persistent_fid, + __u64 volatile_fid) +{ + int rc; + + cifs_dbg(FYI, "%s: tc_count=%d\n", __func__, tcon->tc_count); + spin_lock(&cifs_tcp_ses_lock); + tcon->tc_count++; + spin_unlock(&cifs_tcp_ses_lock); + + rc = __smb2_handle_cancelled_close(tcon, persistent_fid, volatile_fid); + if (rc) + cifs_put_tcon(tcon); + + return rc; +} + int smb2_handle_cancelled_mid(char *buffer, struct TCP_Server_Info *server) { struct smb2_sync_hdr *sync_hdr = (struct smb2_sync_hdr *)buffer; struct smb2_create_rsp *rsp = (struct smb2_create_rsp *)buffer; struct cifs_tcon *tcon; - struct close_cancelled_open *cancelled; + int rc; if (sync_hdr->Command != SMB2_CREATE || sync_hdr->Status != STATUS_SUCCESS) return 0; - cancelled = kzalloc(sizeof(*cancelled), GFP_KERNEL); - if (!cancelled) - return -ENOMEM; - tcon = smb2_find_smb_tcon(server, sync_hdr->SessionId, sync_hdr->TreeId); - if (!tcon) { - kfree(cancelled); + if (!tcon) return -ENOENT; - } - cancelled->fid.persistent_fid = rsp->PersistentFileId; - cancelled->fid.volatile_fid = rsp->VolatileFileId; - cancelled->tcon = tcon; - INIT_WORK(&cancelled->work, smb2_cancelled_close_fid); - queue_work(cifsiod_wq, &cancelled->work); + rc = __smb2_handle_cancelled_close(tcon, rsp->PersistentFileId, + rsp->VolatileFileId); + if (rc) + cifs_put_tcon(tcon); - return 0; + return rc; } /** diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c index 3bbb65e58dd6..4f47800ee279 100644 --- a/fs/cifs/smb2ops.c +++ b/fs/cifs/smb2ops.c @@ -1316,7 +1316,16 @@ static void smb2_close_file(const unsigned int xid, struct cifs_tcon *tcon, struct cifs_fid *fid) { - SMB2_close(xid, tcon, fid->persistent_fid, fid->volatile_fid); + int rc; + + rc = SMB2_close(xid, tcon, fid->persistent_fid, fid->volatile_fid); + if (is_interrupt_error(rc)) { + rc = smb2_handle_cancelled_close(tcon, fid->persistent_fid, + fid->volatile_fid); + if (rc) + cifs_dbg(VFS, "close handle 0x%llx returned error %d\n", + fid->persistent_fid, rc); + } } static int diff --git a/fs/cifs/smb2proto.h b/fs/cifs/smb2proto.h index 07ca72486cfa..e239f98093a9 100644 --- a/fs/cifs/smb2proto.h +++ b/fs/cifs/smb2proto.h @@ -203,6 +203,9 @@ extern int SMB2_set_compression(const unsigned int xid, struct cifs_tcon *tcon, extern int SMB2_oplock_break(const unsigned int xid, struct cifs_tcon *tcon, const u64 persistent_fid, const u64 volatile_fid, const __u8 oplock_level); +extern int smb2_handle_cancelled_close(struct cifs_tcon *tcon, + __u64 persistent_fid, + __u64 volatile_fid); extern int smb2_handle_cancelled_mid(char *buffer, struct TCP_Server_Info *server); void smb2_cancelled_close_fid(struct work_struct *work); -- 2.17.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: A process killed while opening a file can result in leaked open handle on the server 2019-11-13 2:37 ` Pavel Shilovsky @ 2019-11-13 5:19 ` Ronnie Sahlberg 2019-11-13 6:49 ` Ronnie Sahlberg 0 siblings, 1 reply; 10+ messages in thread From: Ronnie Sahlberg @ 2019-11-13 5:19 UTC (permalink / raw) To: Pavel Shilovsky; +Cc: Frank Sorenson, linux-cifs ----- Original Message ----- > From: "Pavel Shilovsky" <piastryyy@gmail.com> > To: "Frank Sorenson" <sorenson@redhat.com> > Cc: "linux-cifs" <linux-cifs@vger.kernel.org> > Sent: Wednesday, 13 November, 2019 12:37:38 PM > Subject: Re: A process killed while opening a file can result in leaked open handle on the server > > вт, 12 нояб. 2019 г. в 11:34, Frank Sorenson <sorenson@redhat.com>: > > > > If a process opening a file is killed while waiting for a SMB2_CREATE > > response from the server, the response may not be handled by the client, > > leaking an open file handle on the server. > > > > Thanks for reporting the problem. > > > This can be reproduced with the following: > > > > # mount //vm3/user1 /mnt/vm3 > > -overs=3,sec=ntlmssp,credentials=/root/.user1_smb_creds > > # cd /mnt/vm3 > > # echo foo > foo > > > > # for i in {1..100} ; do cat foo >/dev/null 2>&1 & sleep 0.0001 ; kill -9 > > $! ; done > > > > (increase count if necessary--100 appears sufficient to cause multiple > > leaked file handles) > > > > This is a known problem and the client handles it by closing unmatched > opens (the one that don't have a corresponding waiting thread) > immediately. It is indicated by the message you observed: "Close > unmatched open". > > > > > The client will stop waiting for the response, and will output > > the following when the response does arrive: > > > > CIFS VFS: Close unmatched open > > > > on the server side, an open file handle is leaked. If using samba, > > the following will show these open files: > > > > > > # smbstatus | grep -i Locked -A1000 > > Locked files: > > Pid Uid DenyMode Access R/W Oplock > > SharePath Name Time > > -------------------------------------------------------------------------------------------------- > > 25936 501 DENY_NONE 0x80 RDONLY NONE > > /home/user1 . Tue Nov 12 12:29:24 2019 > > 25936 501 DENY_NONE 0x80 RDONLY NONE > > /home/user1 . Tue Nov 12 12:29:24 2019 > > 25936 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) > > /home/user1 foo Tue Nov 12 12:29:24 2019 > > 25936 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) > > /home/user1 foo Tue Nov 12 12:29:24 2019 > > 25936 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) > > /home/user1 foo Tue Nov 12 12:29:24 2019 > > 25936 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) > > /home/user1 foo Tue Nov 12 12:29:24 2019 > > 25936 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) > > /home/user1 foo Tue Nov 12 12:29:24 2019 > > > > I tried it locally myself, it seems that we have another issue related > to interrupted close requests: > > [1656476.740311] fs/cifs/file.c: Flush inode 0000000078729371 file > 000000004d5f9348 rc 0 > [1656476.740313] fs/cifs/file.c: closing last open instance for inode > 0000000078729371 > [1656476.740314] fs/cifs/file.c: CIFS VFS: in _cifsFileInfo_put as > Xid: 457 with uid: 1000 > [1656476.740315] fs/cifs/smb2pdu.c: Close > [1656476.740317] fs/cifs/transport.c: signal is pending before sending any > data > > This will return -512 error to the upper layer but we do not save a > problematic handle somewhere to close it later. op->release() error > code is not being checked by VFS anyway. > > We should do the similar thing as we do today for cancelled mids: > allocate "struct close_cancelled_open" and queue the lazy work to > close the handle. > > I attached the patch implementing this idea. The patch handles the > interrupt errors in smb2_close_file which is called during close > system call. It fixed the problem in my setup. In the same time I > think I should probably move the handling to PDU layer function > SMB2_close() to handle *all* interrupted close requests including ones > closing temporary handles. I am wondering if anyone has thoughts about > it. Anyway, review of the patch is highly appreciated. I think your patch makes it better. I seem Close() that your patch now fixes the leak for. But there is still a leak in Open(). I just got one instance where I leaked one handle. The log shows : CIFS VFS: \\192.168.124.207 Cancelling wait for mid 25 cmd: 5 and the wireshark shows an Open() request/response for this MID but we never send a Close() for the handle. > > @Frank, could you please test the patch in your environment? > > -- > Best regards, > Pavel Shilovsky > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: A process killed while opening a file can result in leaked open handle on the server 2019-11-13 5:19 ` Ronnie Sahlberg @ 2019-11-13 6:49 ` Ronnie Sahlberg 2019-11-13 22:15 ` Frank Sorenson 0 siblings, 1 reply; 10+ messages in thread From: Ronnie Sahlberg @ 2019-11-13 6:49 UTC (permalink / raw) To: Pavel Shilovsky; +Cc: Frank Sorenson, linux-cifs [-- Attachment #1: Type: text/plain, Size: 6332 bytes --] Steve, Pavel This patch goes ontop of Pavels patch. Maybe it should be merged with Pavels patch since his patch changes from "we only send a close() on an interrupted open()" to now "we send a close() on either interrupted open() or interrupted close()" so both comments as well as log messages are updates. Additionally it adds logging of the MID that failed in the case of an interrupted Open() so that it is easy to find it in wireshark and check whether that smb2 file handle was indeed handles by a SMB_Close() or not. From testing it appears Pavels patch works. When the close() is interrupted we don't leak handles as far as I can tell. We do have a leak in the Open() case though and it seems that eventhough we set things up and flags the MID to be cancelled we actually never end up calling smb2_cancelled_close_fid() and thus we never send a SMB2_Close(). I haven't found the root cause yet but I suspect we mess up mid flags or state somewhere. It did work in the past though when Sachin provided the initial implementation so we have regressed I think. I have added a new test 'cifs/102' to the buildbot that checks for this but have not integrated into the cifs-testing run yet since we still fail this test. At least we will not have further regressions once we fix this and enable the test in the future. ronnie s ----- Original Message ----- From: "Ronnie Sahlberg" <lsahlber@redhat.com> To: "Pavel Shilovsky" <piastryyy@gmail.com> Cc: "Frank Sorenson" <sorenson@redhat.com>, "linux-cifs" <linux-cifs@vger.kernel.org> Sent: Wednesday, 13 November, 2019 3:19:11 PM Subject: Re: A process killed while opening a file can result in leaked open handle on the server ----- Original Message ----- > From: "Pavel Shilovsky" <piastryyy@gmail.com> > To: "Frank Sorenson" <sorenson@redhat.com> > Cc: "linux-cifs" <linux-cifs@vger.kernel.org> > Sent: Wednesday, 13 November, 2019 12:37:38 PM > Subject: Re: A process killed while opening a file can result in leaked open handle on the server > > вт, 12 нояб. 2019 г. в 11:34, Frank Sorenson <sorenson@redhat.com>: > > > > If a process opening a file is killed while waiting for a SMB2_CREATE > > response from the server, the response may not be handled by the client, > > leaking an open file handle on the server. > > > > Thanks for reporting the problem. > > > This can be reproduced with the following: > > > > # mount //vm3/user1 /mnt/vm3 > > -overs=3,sec=ntlmssp,credentials=/root/.user1_smb_creds > > # cd /mnt/vm3 > > # echo foo > foo > > > > # for i in {1..100} ; do cat foo >/dev/null 2>&1 & sleep 0.0001 ; kill -9 > > $! ; done > > > > (increase count if necessary--100 appears sufficient to cause multiple > > leaked file handles) > > > > This is a known problem and the client handles it by closing unmatched > opens (the one that don't have a corresponding waiting thread) > immediately. It is indicated by the message you observed: "Close > unmatched open". > > > > > The client will stop waiting for the response, and will output > > the following when the response does arrive: > > > > CIFS VFS: Close unmatched open > > > > on the server side, an open file handle is leaked. If using samba, > > the following will show these open files: > > > > > > # smbstatus | grep -i Locked -A1000 > > Locked files: > > Pid Uid DenyMode Access R/W Oplock > > SharePath Name Time > > -------------------------------------------------------------------------------------------------- > > 25936 501 DENY_NONE 0x80 RDONLY NONE > > /home/user1 . Tue Nov 12 12:29:24 2019 > > 25936 501 DENY_NONE 0x80 RDONLY NONE > > /home/user1 . Tue Nov 12 12:29:24 2019 > > 25936 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) > > /home/user1 foo Tue Nov 12 12:29:24 2019 > > 25936 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) > > /home/user1 foo Tue Nov 12 12:29:24 2019 > > 25936 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) > > /home/user1 foo Tue Nov 12 12:29:24 2019 > > 25936 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) > > /home/user1 foo Tue Nov 12 12:29:24 2019 > > 25936 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) > > /home/user1 foo Tue Nov 12 12:29:24 2019 > > > > I tried it locally myself, it seems that we have another issue related > to interrupted close requests: > > [1656476.740311] fs/cifs/file.c: Flush inode 0000000078729371 file > 000000004d5f9348 rc 0 > [1656476.740313] fs/cifs/file.c: closing last open instance for inode > 0000000078729371 > [1656476.740314] fs/cifs/file.c: CIFS VFS: in _cifsFileInfo_put as > Xid: 457 with uid: 1000 > [1656476.740315] fs/cifs/smb2pdu.c: Close > [1656476.740317] fs/cifs/transport.c: signal is pending before sending any > data > > This will return -512 error to the upper layer but we do not save a > problematic handle somewhere to close it later. op->release() error > code is not being checked by VFS anyway. > > We should do the similar thing as we do today for cancelled mids: > allocate "struct close_cancelled_open" and queue the lazy work to > close the handle. > > I attached the patch implementing this idea. The patch handles the > interrupt errors in smb2_close_file which is called during close > system call. It fixed the problem in my setup. In the same time I > think I should probably move the handling to PDU layer function > SMB2_close() to handle *all* interrupted close requests including ones > closing temporary handles. I am wondering if anyone has thoughts about > it. Anyway, review of the patch is highly appreciated. I think your patch makes it better. I seem Close() that your patch now fixes the leak for. But there is still a leak in Open(). I just got one instance where I leaked one handle. The log shows : CIFS VFS: \\192.168.124.207 Cancelling wait for mid 25 cmd: 5 and the wireshark shows an Open() request/response for this MID but we never send a Close() for the handle. > > @Frank, could you please test the patch in your environment? > > -- > Best regards, > Pavel Shilovsky > [-- Attachment #2: patch --] [-- Type: text/x-patch, Size: 3254 bytes --] diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h index d78bfcc19156..39a9165f9bba 100644 --- a/fs/cifs/cifsglob.h +++ b/fs/cifs/cifsglob.h @@ -1551,6 +1551,8 @@ struct close_cancelled_open { struct cifs_fid fid; struct cifs_tcon *tcon; struct work_struct work; + __u64 mid; + __u16 cmd; }; /* Make code in transport.c a little cleaner by moving diff --git a/fs/cifs/smb2misc.c b/fs/cifs/smb2misc.c index ba1e5754239c..1fb9932d3dae 100644 --- a/fs/cifs/smb2misc.c +++ b/fs/cifs/smb2misc.c @@ -735,19 +735,36 @@ smb2_cancelled_close_fid(struct work_struct *work) { struct close_cancelled_open *cancelled = container_of(work, struct close_cancelled_open, work); + struct cifs_tcon *tcon = cancelled->tcon; + int rc; - cifs_dbg(VFS, "Close unmatched open\n"); + if (cancelled->mid) + cifs_tcon_dbg(VFS, "Close unmatched open for MID:%llx\n", + cancelled->mid); + else + cifs_tcon_dbg(VFS, "Close interrupted close\n"); - SMB2_close(0, cancelled->tcon, cancelled->fid.persistent_fid, - cancelled->fid.volatile_fid); - cifs_put_tcon(cancelled->tcon); + rc = SMB2_close(0, tcon, cancelled->fid.persistent_fid, + cancelled->fid.volatile_fid); + if (rc) { + cifs_tcon_dbg(VFS, "Close cancelled mid failed with rc:%d\n", rc); + } + + cifs_put_tcon(tcon); kfree(cancelled); } -/* Caller should already has an extra reference to @tcon */ +/* + * Caller should already has an extra reference to @tcon + * This function is used to queue work to close a handle to prevent leaks + * on the server. + * We handle two cases. If an open was interrupted after we sent the + * SMB2_CREATE to the server but before we processed the reply, and second + * if a close was interrupted before we sent the SMB2_CLOSE to the server. + */ static int -__smb2_handle_cancelled_close(struct cifs_tcon *tcon, __u64 persistent_fid, - __u64 volatile_fid) +__smb2_handle_cancelled_cmd(struct cifs_tcon *tcon, __u16 cmd, __u64 mid, + __u64 persistent_fid, __u64 volatile_fid) { struct close_cancelled_open *cancelled; @@ -758,6 +775,8 @@ __smb2_handle_cancelled_close(struct cifs_tcon *tcon, __u64 persistent_fid, cancelled->fid.persistent_fid = persistent_fid; cancelled->fid.volatile_fid = volatile_fid; cancelled->tcon = tcon; + cancelled->cmd = cmd; + cancelled->mid = mid; INIT_WORK(&cancelled->work, smb2_cancelled_close_fid); WARN_ON(queue_work(cifsiod_wq, &cancelled->work) == false); @@ -775,7 +794,8 @@ smb2_handle_cancelled_close(struct cifs_tcon *tcon, __u64 persistent_fid, tcon->tc_count++; spin_unlock(&cifs_tcp_ses_lock); - rc = __smb2_handle_cancelled_close(tcon, persistent_fid, volatile_fid); + rc = __smb2_handle_cancelled_cmd(tcon, SMB2_CLOSE_HE, 0, + persistent_fid, volatile_fid); if (rc) cifs_put_tcon(tcon); @@ -799,8 +819,11 @@ smb2_handle_cancelled_mid(char *buffer, struct TCP_Server_Info *server) if (!tcon) return -ENOENT; - rc = __smb2_handle_cancelled_close(tcon, rsp->PersistentFileId, - rsp->VolatileFileId); + rc = __smb2_handle_cancelled_cmd(tcon, + le16_to_cpu(sync_hdr->Command), + le64_to_cpu(sync_hdr->MessageId), + rsp->PersistentFileId, + rsp->VolatileFileId); if (rc) cifs_put_tcon(tcon); ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: A process killed while opening a file can result in leaked open handle on the server 2019-11-13 6:49 ` Ronnie Sahlberg @ 2019-11-13 22:15 ` Frank Sorenson 2019-11-14 1:39 ` Ronnie Sahlberg 0 siblings, 1 reply; 10+ messages in thread From: Frank Sorenson @ 2019-11-13 22:15 UTC (permalink / raw) To: Ronnie Sahlberg, Pavel Shilovsky; +Cc: linux-cifs On 11/13/19 12:49 AM, Ronnie Sahlberg wrote: > Steve, Pavel > > This patch goes ontop of Pavels patch. > Maybe it should be merged with Pavels patch since his patch changes from "we only send a close() on an interrupted open()" > to now "we send a close() on either interrupted open() or interrupted close()" so both comments as well as log messages are updates. > > Additionally it adds logging of the MID that failed in the case of an interrupted Open() so that it is easy to find it in wireshark > and check whether that smb2 file handle was indeed handles by a SMB_Close() or not. > > > From testing it appears Pavels patch works. When the close() is interrupted we don't leak handles as far as I can tell. > We do have a leak in the Open() case though and it seems that eventhough we set things up and flags the MID to be cancelled we actually never end up > calling smb2_cancelled_close_fid() and thus we never send a SMB2_Close(). > I haven't found the root cause yet but I suspect we mess up mid flags or state somewhere. > > > It did work in the past though when Sachin provided the initial implementation so we have regressed I think. > I have added a new test 'cifs/102' to the buildbot that checks for this but have not integrated into the cifs-testing run yet since we still fail this test. > At least we will not have further regressions once we fix this and enable the test in the future. > > ronnie s The patches do indeed improve it significantly. I'm still seeing some leak as well, and I'm removing ratelimiting so that I can see what the added debugging is trying to tell us. I'll report if I find more details. Thanks for the help. Frank ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: A process killed while opening a file can result in leaked open handle on the server 2019-11-13 22:15 ` Frank Sorenson @ 2019-11-14 1:39 ` Ronnie Sahlberg 2019-11-15 18:21 ` Pavel Shilovskiy 2019-11-17 16:29 ` Frank Sorenson 0 siblings, 2 replies; 10+ messages in thread From: Ronnie Sahlberg @ 2019-11-14 1:39 UTC (permalink / raw) To: Frank Sorenson; +Cc: Pavel Shilovsky, linux-cifs ----- Original Message ----- > From: "Frank Sorenson" <sorenson@redhat.com> > To: "Ronnie Sahlberg" <lsahlber@redhat.com>, "Pavel Shilovsky" <piastryyy@gmail.com> > Cc: "linux-cifs" <linux-cifs@vger.kernel.org> > Sent: Thursday, 14 November, 2019 8:15:46 AM > Subject: Re: A process killed while opening a file can result in leaked open handle on the server > > On 11/13/19 12:49 AM, Ronnie Sahlberg wrote: > > Steve, Pavel > > > > This patch goes ontop of Pavels patch. > > Maybe it should be merged with Pavels patch since his patch changes from > > "we only send a close() on an interrupted open()" > > to now "we send a close() on either interrupted open() or interrupted > > close()" so both comments as well as log messages are updates. > > > > Additionally it adds logging of the MID that failed in the case of an > > interrupted Open() so that it is easy to find it in wireshark > > and check whether that smb2 file handle was indeed handles by a SMB_Close() > > or not. > > > > > > From testing it appears Pavels patch works. When the close() is interrupted > > we don't leak handles as far as I can tell. > > We do have a leak in the Open() case though and it seems that eventhough we > > set things up and flags the MID to be cancelled we actually never end up > > calling smb2_cancelled_close_fid() and thus we never send a SMB2_Close(). > > I haven't found the root cause yet but I suspect we mess up mid flags or > > state somewhere. > > > > > > It did work in the past though when Sachin provided the initial > > implementation so we have regressed I think. > > I have added a new test 'cifs/102' to the buildbot that checks for this > > but have not integrated into the cifs-testing run yet since we still fail > > this test. > > At least we will not have further regressions once we fix this and enable > > the test in the future. > > > > ronnie s > > The patches do indeed improve it significantly. > > I'm still seeing some leak as well, and I'm removing ratelimiting so > that I can see what the added debugging is trying to tell us. I'll > report if I find more details. > We are making progress. Can you test this patch if it improves even more for you? It fixes most but not all the leaks I see for interrupted open(): I will post this to the list too as a separate mail/patch. Author: Ronnie Sahlberg <lsahlber@redhat.com> Date: Thu Nov 14 11:23:06 2019 +1000 cifs: fix race between compound_send_recv() and the demultiplex thread There is a race where the open() may be interrupted between when we receive the reply but before we have invoked the callback in which case we never end up calling handle_cancelled_mid() and thus leak an open handle on the server. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index ccaa8bad336f..802604a7e692 100644 --- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -1223,7 +1223,6 @@ cifs_demultiplex_thread(void *p) if (mids[i] != NULL) { mids[i]->resp_buf_size = server->pdu_size; if ((mids[i]->mid_flags & MID_WAIT_CANCELLED) && - mids[i]->mid_state == MID_RESPONSE_RECEIVED && server->ops->handle_cancelled_mid) server->ops->handle_cancelled_mid( mids[i]->resp_buf, diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c index ca3de62688d6..28018a7eccb2 100644 --- a/fs/cifs/transport.c +++ b/fs/cifs/transport.c @@ -1119,7 +1119,8 @@ compound_send_recv(const unsigned int xid, struct cifs_ses *ses, midQ[i]->mid, le16_to_cpu(midQ[i]->command)); send_cancel(server, &rqst[i], midQ[i]); spin_lock(&GlobalMid_Lock); - if (midQ[i]->mid_state == MID_REQUEST_SUBMITTED) { + if (midQ[i]->mid_state == MID_REQUEST_SUBMITTED || + midQ[i]->mid_state == MID_RESPONSE_RECEIVED) { midQ[i]->mid_flags |= MID_WAIT_CANCELLED; midQ[i]->callback = cifs_cancelled_callback; cancelled_mid[i] = true; > > Thanks for the help. > > > Frank > > ^ permalink raw reply related [flat|nested] 10+ messages in thread
* RE: A process killed while opening a file can result in leaked open handle on the server 2019-11-14 1:39 ` Ronnie Sahlberg @ 2019-11-15 18:21 ` Pavel Shilovskiy 2019-11-17 16:29 ` Frank Sorenson 1 sibling, 0 replies; 10+ messages in thread From: Pavel Shilovskiy @ 2019-11-15 18:21 UTC (permalink / raw) To: Ronnie Sahlberg, Frank Sorenson, Steven French Cc: Pavel Shilovsky, linux-cifs ср, 13 нояб. 2019 г. в 17:39, Ronnie Sahlberg <lsahlber@redhat.com>: > > > > > > ----- Original Message ----- > > From: "Frank Sorenson" <sorenson@redhat.com> > > To: "Ronnie Sahlberg" <lsahlber@redhat.com>, "Pavel Shilovsky" <piastryyy@gmail.com> > > Cc: "linux-cifs" <linux-cifs@vger.kernel.org> > > Sent: Thursday, 14 November, 2019 8:15:46 AM > > Subject: Re: A process killed while opening a file can result in leaked open handle on the server > > > > On 11/13/19 12:49 AM, Ronnie Sahlberg wrote: > > > Steve, Pavel > > > > > > This patch goes ontop of Pavels patch. > > > Maybe it should be merged with Pavels patch since his patch changes from > > > "we only send a close() on an interrupted open()" > > > to now "we send a close() on either interrupted open() or interrupted > > > close()" so both comments as well as log messages are updates. > > > > > > Additionally it adds logging of the MID that failed in the case of an > > > interrupted Open() so that it is easy to find it in wireshark > > > and check whether that smb2 file handle was indeed handles by a SMB_Close() > > > or not. > > > > > > > > > From testing it appears Pavels patch works. When the close() is interrupted > > > we don't leak handles as far as I can tell. > > > We do have a leak in the Open() case though and it seems that eventhough we > > > set things up and flags the MID to be cancelled we actually never end up > > > calling smb2_cancelled_close_fid() and thus we never send a SMB2_Close(). > > > I haven't found the root cause yet but I suspect we mess up mid flags or > > > state somewhere. > > > > > > > > > It did work in the past though when Sachin provided the initial > > > implementation so we have regressed I think. > > > I have added a new test 'cifs/102' to the buildbot that checks for this > > > but have not integrated into the cifs-testing run yet since we still fail > > > this test. > > > At least we will not have further regressions once we fix this and enable > > > the test in the future. > > > > > > ronnie s > > > > The patches do indeed improve it significantly. > > > > I'm still seeing some leak as well, and I'm removing ratelimiting so > > that I can see what the added debugging is trying to tell us. I'll > > report if I find more details. > > > > We are making progress. > Can you test this patch if it improves even more for you? > It fixes most but not all the leaks I see for interrupted open(): > > I will post this to the list too as a separate mail/patch. > > > Author: Ronnie Sahlberg <lsahlber@redhat.com> > Date: Thu Nov 14 11:23:06 2019 +1000 > > cifs: fix race between compound_send_recv() and the demultiplex thread > > There is a race where the open() may be interrupted between when we receive the reply > but before we have invoked the callback in which case we never end up calling > handle_cancelled_mid() and thus leak an open handle on the server. > > Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> > > diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c > index ccaa8bad336f..802604a7e692 100644 > --- a/fs/cifs/connect.c > +++ b/fs/cifs/connect.c > @@ -1223,7 +1223,6 @@ cifs_demultiplex_thread(void *p) > if (mids[i] != NULL) { > mids[i]->resp_buf_size = server->pdu_size; > if ((mids[i]->mid_flags & MID_WAIT_CANCELLED) && > - mids[i]->mid_state == MID_RESPONSE_RECEIVED && > server->ops->handle_cancelled_mid) > server->ops->handle_cancelled_mid( > mids[i]->resp_buf, > diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c > index ca3de62688d6..28018a7eccb2 100644 > --- a/fs/cifs/transport.c > +++ b/fs/cifs/transport.c > @@ -1119,7 +1119,8 @@ compound_send_recv(const unsigned int xid, struct cifs_ses *ses, > midQ[i]->mid, le16_to_cpu(midQ[i]->command)); > send_cancel(server, &rqst[i], midQ[i]); > spin_lock(&GlobalMid_Lock); > - if (midQ[i]->mid_state == MID_REQUEST_SUBMITTED) { > + if (midQ[i]->mid_state == MID_REQUEST_SUBMITTED || > + midQ[i]->mid_state == MID_RESPONSE_RECEIVED) { > midQ[i]->mid_flags |= MID_WAIT_CANCELLED; > midQ[i]->callback = cifs_cancelled_callback; > cancelled_mid[i] = true; > > > > > > > > > Thanks for the help. > > > > > > Frank > > > > > Thanks Frank, Ronnie for testing my patch! I also like the Ronnie's patch - good progress overall! About my patch, I am going to add a proper description once I get time next week and will probably do some improvements: 1) Currently with my patch the client will send Close even if it has already received a response but wait_for_response() was interrupted. The latter function returns -ERESTARTSYS. What we want to do instead is to send Close only if *sending* was interrupted. In that case we return -EINTR, see __smb_send_rqst(): if (signal_pending(current)) { cifs_dbg(FYI, "signal is pending before sending any data\n"); return -EINTR; } 2) Move the error handling to the PDU layer function SMB2_close() to handle *all* interrupted close requests including ones closing temporary handles. Please comment on the two changes above! -- Best regards, Pavel Shilovsky ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: A process killed while opening a file can result in leaked open handle on the server 2019-11-14 1:39 ` Ronnie Sahlberg 2019-11-15 18:21 ` Pavel Shilovskiy @ 2019-11-17 16:29 ` Frank Sorenson 2019-11-21 19:41 ` Pavel Shilovsky 1 sibling, 1 reply; 10+ messages in thread From: Frank Sorenson @ 2019-11-17 16:29 UTC (permalink / raw) To: Ronnie Sahlberg; +Cc: Pavel Shilovsky, linux-cifs On 11/13/19 7:39 PM, Ronnie Sahlberg wrote: > ----- Original Message ----- >> From: "Frank Sorenson" <sorenson@redhat.com> >> To: "Ronnie Sahlberg" <lsahlber@redhat.com>, "Pavel Shilovsky" <piastryyy@gmail.com> >> Cc: "linux-cifs" <linux-cifs@vger.kernel.org> >> Sent: Thursday, 14 November, 2019 8:15:46 AM >> Subject: Re: A process killed while opening a file can result in leaked open handle on the server >> >> On 11/13/19 12:49 AM, Ronnie Sahlberg wrote: >>> Steve, Pavel >>> >>> This patch goes ontop of Pavels patch. >>> Maybe it should be merged with Pavels patch since his patch changes from >>> "we only send a close() on an interrupted open()" >>> to now "we send a close() on either interrupted open() or interrupted >>> close()" so both comments as well as log messages are updates. >>> >>> Additionally it adds logging of the MID that failed in the case of an >>> interrupted Open() so that it is easy to find it in wireshark >>> and check whether that smb2 file handle was indeed handles by a SMB_Close() >>> or not. >>> >>> >>> From testing it appears Pavels patch works. When the close() is interrupted >>> we don't leak handles as far as I can tell. >>> We do have a leak in the Open() case though and it seems that eventhough we >>> set things up and flags the MID to be cancelled we actually never end up >>> calling smb2_cancelled_close_fid() and thus we never send a SMB2_Close(). >>> I haven't found the root cause yet but I suspect we mess up mid flags or >>> state somewhere. >>> >>> >>> It did work in the past though when Sachin provided the initial >>> implementation so we have regressed I think. >>> I have added a new test 'cifs/102' to the buildbot that checks for this >>> but have not integrated into the cifs-testing run yet since we still fail >>> this test. >>> At least we will not have further regressions once we fix this and enable >>> the test in the future. >>> >>> ronnie s >> >> The patches do indeed improve it significantly. >> >> I'm still seeing some leak as well, and I'm removing ratelimiting so >> that I can see what the added debugging is trying to tell us. I'll >> report if I find more details. > We are making progress. Agreed. We're definitely making progress. > Can you test this patch if it improves even more for you? > It fixes most but not all the leaks I see for interrupted open(): > > I will post this to the list too as a separate mail/patch. Sorry to be slow on the testing. I might be seeing some small improvement with this one, but I'm still seeing some mismatches: # for i in {1..100} ; do cat /mnt/vm3/foo.$i >/dev/null 2>&1 & sleep 0.0001 ; kill -9 $! ; done ... This ended up with 2 open on the server side: 21842 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) /home/user1 foo.32 Sun Nov 17 09:13:38 2019 21842 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) /home/user1 foo.48 Sun Nov 17 09:13:38 2019 the packet capture shows the same mismatch pattern for these two: 102 Create Request File: foo.32;GetInfo Request FILE_INFO/SMB2_FILE_ALL_INFO;Close Request 103 Create Response File: foo.32;GetInfo Response;Close Response 104 Create Request File: foo.32 105 Create Response File: foo.32 148 Create Request File: foo.48;GetInfo Request FILE_INFO/SMB2_FILE_ALL_INFO;Close Request 149 Create Response File: foo.48;GetInfo Response;Close Response 150 Create Request File: foo.48 151 Create Response File: foo.48 with no close for those two. the messages are also similar, and show transmitting the second open request and cancelling the wait immediately afterward: [9006] cifs:cifs_lookup:669: fs/cifs/dir.c: CIFS VFS: in cifs_lookup as Xid: 1091 with uid: 0 [9006] cifs:cifs_lookup:672: fs/cifs/dir.c: parent inode = 0x000000008f9424fe name is: foo.32 and dentry = 0x0000000053e436bf [9006] cifs:build_path_from_dentry_optional_prefix:143: fs/cifs/dir.c: name: \foo.32 [9006] cifs:cifs_lookup:704: fs/cifs/dir.c: NULL inode in lookup [9006] cifs:cifs_lookup:707: fs/cifs/dir.c: Full path: \foo.32 inode = 0x000000004667ea0b [9006] cifs:cifs_get_inode_info:753: fs/cifs/inode.c: Getting info on \foo.32 [9006] cifs:__smb_send_rqst:368: fs/cifs/transport.c: Sending smb: smb_len=356 [9006] cifs:cifs_sync_mid_result:859: fs/cifs/transport.c: cifs_sync_mid_result: cmd=5 mid=113 state=4 [9006] cifs:cifs_sync_mid_result:859: fs/cifs/transport.c: cifs_sync_mid_result: cmd=16 mid=114 state=4 [9006] cifs:cifs_sync_mid_result:859: fs/cifs/transport.c: cifs_sync_mid_result: cmd=6 mid=115 state=4 [9006] cifs:cifs_small_buf_release:222: fs/cifs/misc.c: Null buffer passed to cifs_small_buf_release [9006] cifs:cifs_small_buf_release:222: fs/cifs/misc.c: Null buffer passed to cifs_small_buf_release [9006] cifs:cifs_small_buf_release:222: fs/cifs/misc.c: Null buffer passed to cifs_small_buf_release [9006] cifs:cifs_iget:1030: fs/cifs/inode.c: looking for uniqueid=5934909 [9006] cifs:cifs_revalidate_cache:100: fs/cifs/inode.c: cifs_revalidate_cache: revalidating inode 5934909 [9006] cifs:cifs_revalidate_cache:124: fs/cifs/inode.c: cifs_revalidate_cache: invalidating inode 5934909 mapping [9006] cifs:cifs_lookup:734: fs/cifs/dir.c: CIFS VFS: leaving cifs_lookup (xid = 1091) rc = 0 [9006] cifs:cifs_open:512: fs/cifs/file.c: CIFS VFS: in cifs_open as Xid: 1092 with uid: 0 [9006] cifs:build_path_from_dentry_optional_prefix:143: fs/cifs/dir.c: name: \foo.32 [9006] cifs:cifs_open:530: fs/cifs/file.c: inode = 0x000000001a16a2ae file flags are 0x8000 for \foo.32 [9006] cifs:SMB2_open:2581: fs/cifs/smb2pdu.c: create/open [9006] cifs:__smb_send_rqst:368: fs/cifs/transport.c: Sending smb: smb_len=284 CIFS VFS: \\vm3 Cancelling wait for mid 116 cmd: 5 [9006] cifs:cifs_open:618: fs/cifs/file.c: CIFS VFS: leaving cifs_open (xid = 1092) rc = -512 [9039] cifs:cifs_lookup:669: fs/cifs/dir.c: CIFS VFS: in cifs_lookup as Xid: 1109 with uid: 0 [9039] cifs:cifs_lookup:672: fs/cifs/dir.c: parent inode = 0x000000008f9424fe name is: foo.48 and dentry = 0x0000000040aea0d9 [9039] cifs:build_path_from_dentry_optional_prefix:143: fs/cifs/dir.c: name: \foo.48 [9039] cifs:cifs_lookup:704: fs/cifs/dir.c: NULL inode in lookup [9039] cifs:cifs_lookup:707: fs/cifs/dir.c: Full path: \foo.48 inode = 0x000000004667ea0b [9039] cifs:cifs_get_inode_info:753: fs/cifs/inode.c: Getting info on \foo.48 [9039] cifs:__smb_send_rqst:368: fs/cifs/transport.c: Sending smb: smb_len=356 [9039] cifs:cifs_sync_mid_result:859: fs/cifs/transport.c: cifs_sync_mid_result: cmd=5 mid=158 state=4 [9039] cifs:cifs_sync_mid_result:859: fs/cifs/transport.c: cifs_sync_mid_result: cmd=16 mid=159 state=4 [9039] cifs:cifs_sync_mid_result:859: fs/cifs/transport.c: cifs_sync_mid_result: cmd=6 mid=160 state=4 [9039] cifs:cifs_small_buf_release:222: fs/cifs/misc.c: Null buffer passed to cifs_small_buf_release [9039] cifs:cifs_small_buf_release:222: fs/cifs/misc.c: Null buffer passed to cifs_small_buf_release [9039] cifs:cifs_small_buf_release:222: fs/cifs/misc.c: Null buffer passed to cifs_small_buf_release [9039] cifs:cifs_iget:1030: fs/cifs/inode.c: looking for uniqueid=21857488 [9039] cifs:cifs_revalidate_cache:100: fs/cifs/inode.c: cifs_revalidate_cache: revalidating inode 21857488 [9039] cifs:cifs_revalidate_cache:124: fs/cifs/inode.c: cifs_revalidate_cache: invalidating inode 21857488 mapping [9039] cifs:cifs_lookup:734: fs/cifs/dir.c: CIFS VFS: leaving cifs_lookup (xid = 1109) rc = 0 [9039] cifs:cifs_open:512: fs/cifs/file.c: CIFS VFS: in cifs_open as Xid: 1110 with uid: 0 [9039] cifs:build_path_from_dentry_optional_prefix:143: fs/cifs/dir.c: name: \foo.48 [9039] cifs:cifs_open:530: fs/cifs/file.c: inode = 0x00000000001a4f79 file flags are 0x8000 for \foo.48 [9039] cifs:SMB2_open:2581: fs/cifs/smb2pdu.c: create/open [9039] cifs:__smb_send_rqst:368: fs/cifs/transport.c: Sending smb: smb_len=284 CIFS VFS: \\vm3 Cancelling wait for mid 161 cmd: 5 [9039] cifs:cifs_open:618: fs/cifs/file.c: CIFS VFS: leaving cifs_open (xid = 1110) rc = -512 Frank ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: A process killed while opening a file can result in leaked open handle on the server 2019-11-17 16:29 ` Frank Sorenson @ 2019-11-21 19:41 ` Pavel Shilovsky 0 siblings, 0 replies; 10+ messages in thread From: Pavel Shilovsky @ 2019-11-21 19:41 UTC (permalink / raw) To: Frank Sorenson; +Cc: Ronnie Sahlberg, linux-cifs Hi Frank, I have just posted 3 patches to the list to address remaining handle leaks: CIFS: Close open handle after interrupted close (2nd version of my original patch to handle interrupted closes) CIFS: Fix NULL pointer dereference in mid callback (not a handle leak fix but the kernel kept crashing on my system during intensive testing for handle leaks) CIFS: Do not miss cancelled OPEN responses (fix to handle some unmatched opens that are not processed right now due to races) Could you please test them in your environment? I ran a script similar to what you use for repro (just more iterations) and Samba didn't show leaked handles afterwards. -- Best regards, Pavel Shilovsky вс, 17 нояб. 2019 г. в 08:29, Frank Sorenson <sorenson@redhat.com>: > > On 11/13/19 7:39 PM, Ronnie Sahlberg wrote: > > ----- Original Message ----- > >> From: "Frank Sorenson" <sorenson@redhat.com> > >> To: "Ronnie Sahlberg" <lsahlber@redhat.com>, "Pavel Shilovsky" <piastryyy@gmail.com> > >> Cc: "linux-cifs" <linux-cifs@vger.kernel.org> > >> Sent: Thursday, 14 November, 2019 8:15:46 AM > >> Subject: Re: A process killed while opening a file can result in leaked open handle on the server > >> > >> On 11/13/19 12:49 AM, Ronnie Sahlberg wrote: > >>> Steve, Pavel > >>> > >>> This patch goes ontop of Pavels patch. > >>> Maybe it should be merged with Pavels patch since his patch changes from > >>> "we only send a close() on an interrupted open()" > >>> to now "we send a close() on either interrupted open() or interrupted > >>> close()" so both comments as well as log messages are updates. > >>> > >>> Additionally it adds logging of the MID that failed in the case of an > >>> interrupted Open() so that it is easy to find it in wireshark > >>> and check whether that smb2 file handle was indeed handles by a SMB_Close() > >>> or not. > >>> > >>> > >>> From testing it appears Pavels patch works. When the close() is interrupted > >>> we don't leak handles as far as I can tell. > >>> We do have a leak in the Open() case though and it seems that eventhough we > >>> set things up and flags the MID to be cancelled we actually never end up > >>> calling smb2_cancelled_close_fid() and thus we never send a SMB2_Close(). > >>> I haven't found the root cause yet but I suspect we mess up mid flags or > >>> state somewhere. > >>> > >>> > >>> It did work in the past though when Sachin provided the initial > >>> implementation so we have regressed I think. > >>> I have added a new test 'cifs/102' to the buildbot that checks for this > >>> but have not integrated into the cifs-testing run yet since we still fail > >>> this test. > >>> At least we will not have further regressions once we fix this and enable > >>> the test in the future. > >>> > >>> ronnie s > >> > >> The patches do indeed improve it significantly. > >> > >> I'm still seeing some leak as well, and I'm removing ratelimiting so > >> that I can see what the added debugging is trying to tell us. I'll > >> report if I find more details. > > > We are making progress. > > Agreed. We're definitely making progress. > > > Can you test this patch if it improves even more for you? > > It fixes most but not all the leaks I see for interrupted open(): > > > > I will post this to the list too as a separate mail/patch. > > > Sorry to be slow on the testing. > > > I might be seeing some small improvement with this one, but I'm still seeing some mismatches: > > # for i in {1..100} ; do cat /mnt/vm3/foo.$i >/dev/null 2>&1 & sleep 0.0001 ; kill -9 $! ; done > ... > > This ended up with 2 open on the server side: > > 21842 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) /home/user1 foo.32 Sun Nov 17 09:13:38 2019 > 21842 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) /home/user1 foo.48 Sun Nov 17 09:13:38 2019 > > > the packet capture shows the same mismatch pattern for these two: > > 102 Create Request File: foo.32;GetInfo Request FILE_INFO/SMB2_FILE_ALL_INFO;Close Request > 103 Create Response File: foo.32;GetInfo Response;Close Response > 104 Create Request File: foo.32 > 105 Create Response File: foo.32 > > > 148 Create Request File: foo.48;GetInfo Request FILE_INFO/SMB2_FILE_ALL_INFO;Close Request > 149 Create Response File: foo.48;GetInfo Response;Close Response > 150 Create Request File: foo.48 > 151 Create Response File: foo.48 > > with no close for those two. > > > the messages are also similar, and show transmitting the second open request and cancelling the wait immediately afterward: > [9006] cifs:cifs_lookup:669: fs/cifs/dir.c: CIFS VFS: in cifs_lookup as Xid: 1091 with uid: 0 > [9006] cifs:cifs_lookup:672: fs/cifs/dir.c: parent inode = 0x000000008f9424fe name is: foo.32 and dentry = 0x0000000053e436bf > [9006] cifs:build_path_from_dentry_optional_prefix:143: fs/cifs/dir.c: name: \foo.32 > [9006] cifs:cifs_lookup:704: fs/cifs/dir.c: NULL inode in lookup > [9006] cifs:cifs_lookup:707: fs/cifs/dir.c: Full path: \foo.32 inode = 0x000000004667ea0b > [9006] cifs:cifs_get_inode_info:753: fs/cifs/inode.c: Getting info on \foo.32 > [9006] cifs:__smb_send_rqst:368: fs/cifs/transport.c: Sending smb: smb_len=356 > [9006] cifs:cifs_sync_mid_result:859: fs/cifs/transport.c: cifs_sync_mid_result: cmd=5 mid=113 state=4 > [9006] cifs:cifs_sync_mid_result:859: fs/cifs/transport.c: cifs_sync_mid_result: cmd=16 mid=114 state=4 > [9006] cifs:cifs_sync_mid_result:859: fs/cifs/transport.c: cifs_sync_mid_result: cmd=6 mid=115 state=4 > [9006] cifs:cifs_small_buf_release:222: fs/cifs/misc.c: Null buffer passed to cifs_small_buf_release > [9006] cifs:cifs_small_buf_release:222: fs/cifs/misc.c: Null buffer passed to cifs_small_buf_release > [9006] cifs:cifs_small_buf_release:222: fs/cifs/misc.c: Null buffer passed to cifs_small_buf_release > [9006] cifs:cifs_iget:1030: fs/cifs/inode.c: looking for uniqueid=5934909 > [9006] cifs:cifs_revalidate_cache:100: fs/cifs/inode.c: cifs_revalidate_cache: revalidating inode 5934909 > [9006] cifs:cifs_revalidate_cache:124: fs/cifs/inode.c: cifs_revalidate_cache: invalidating inode 5934909 mapping > [9006] cifs:cifs_lookup:734: fs/cifs/dir.c: CIFS VFS: leaving cifs_lookup (xid = 1091) rc = 0 > [9006] cifs:cifs_open:512: fs/cifs/file.c: CIFS VFS: in cifs_open as Xid: 1092 with uid: 0 > [9006] cifs:build_path_from_dentry_optional_prefix:143: fs/cifs/dir.c: name: \foo.32 > [9006] cifs:cifs_open:530: fs/cifs/file.c: inode = 0x000000001a16a2ae file flags are 0x8000 for \foo.32 > [9006] cifs:SMB2_open:2581: fs/cifs/smb2pdu.c: create/open > [9006] cifs:__smb_send_rqst:368: fs/cifs/transport.c: Sending smb: smb_len=284 > CIFS VFS: \\vm3 Cancelling wait for mid 116 cmd: 5 > [9006] cifs:cifs_open:618: fs/cifs/file.c: CIFS VFS: leaving cifs_open (xid = 1092) rc = -512 > > > > [9039] cifs:cifs_lookup:669: fs/cifs/dir.c: CIFS VFS: in cifs_lookup as Xid: 1109 with uid: 0 > [9039] cifs:cifs_lookup:672: fs/cifs/dir.c: parent inode = 0x000000008f9424fe name is: foo.48 and dentry = 0x0000000040aea0d9 > [9039] cifs:build_path_from_dentry_optional_prefix:143: fs/cifs/dir.c: name: \foo.48 > [9039] cifs:cifs_lookup:704: fs/cifs/dir.c: NULL inode in lookup > [9039] cifs:cifs_lookup:707: fs/cifs/dir.c: Full path: \foo.48 inode = 0x000000004667ea0b > [9039] cifs:cifs_get_inode_info:753: fs/cifs/inode.c: Getting info on \foo.48 > [9039] cifs:__smb_send_rqst:368: fs/cifs/transport.c: Sending smb: smb_len=356 > [9039] cifs:cifs_sync_mid_result:859: fs/cifs/transport.c: cifs_sync_mid_result: cmd=5 mid=158 state=4 > [9039] cifs:cifs_sync_mid_result:859: fs/cifs/transport.c: cifs_sync_mid_result: cmd=16 mid=159 state=4 > [9039] cifs:cifs_sync_mid_result:859: fs/cifs/transport.c: cifs_sync_mid_result: cmd=6 mid=160 state=4 > [9039] cifs:cifs_small_buf_release:222: fs/cifs/misc.c: Null buffer passed to cifs_small_buf_release > [9039] cifs:cifs_small_buf_release:222: fs/cifs/misc.c: Null buffer passed to cifs_small_buf_release > [9039] cifs:cifs_small_buf_release:222: fs/cifs/misc.c: Null buffer passed to cifs_small_buf_release > [9039] cifs:cifs_iget:1030: fs/cifs/inode.c: looking for uniqueid=21857488 > [9039] cifs:cifs_revalidate_cache:100: fs/cifs/inode.c: cifs_revalidate_cache: revalidating inode 21857488 > [9039] cifs:cifs_revalidate_cache:124: fs/cifs/inode.c: cifs_revalidate_cache: invalidating inode 21857488 mapping > [9039] cifs:cifs_lookup:734: fs/cifs/dir.c: CIFS VFS: leaving cifs_lookup (xid = 1109) rc = 0 > [9039] cifs:cifs_open:512: fs/cifs/file.c: CIFS VFS: in cifs_open as Xid: 1110 with uid: 0 > [9039] cifs:build_path_from_dentry_optional_prefix:143: fs/cifs/dir.c: name: \foo.48 > [9039] cifs:cifs_open:530: fs/cifs/file.c: inode = 0x00000000001a4f79 file flags are 0x8000 for \foo.48 > [9039] cifs:SMB2_open:2581: fs/cifs/smb2pdu.c: create/open > [9039] cifs:__smb_send_rqst:368: fs/cifs/transport.c: Sending smb: smb_len=284 > CIFS VFS: \\vm3 Cancelling wait for mid 161 cmd: 5 > [9039] cifs:cifs_open:618: fs/cifs/file.c: CIFS VFS: leaving cifs_open (xid = 1110) rc = -512 > > > Frank > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: A process killed while opening a file can result in leaked open handle on the server 2019-11-12 19:34 A process killed while opening a file can result in leaked open handle on the server Frank Sorenson 2019-11-13 2:37 ` Pavel Shilovsky @ 2019-11-13 3:28 ` Ronnie Sahlberg 1 sibling, 0 replies; 10+ messages in thread From: Ronnie Sahlberg @ 2019-11-13 3:28 UTC (permalink / raw) To: Frank Sorenson; +Cc: linux-cifs [-- Attachment #1: Type: text/plain, Size: 3309 bytes --] ----- Original Message ----- > From: "Frank Sorenson" <sorenson@redhat.com> > To: "linux-cifs" <linux-cifs@vger.kernel.org> > Sent: Wednesday, 13 November, 2019 5:34:27 AM > Subject: A process killed while opening a file can result in leaked open handle on the server > > If a process opening a file is killed while waiting for a SMB2_CREATE > response from the server, the response may not be handled by the client, > leaking an open file handle on the server. > > This can be reproduced with the following: > > # mount //vm3/user1 /mnt/vm3 > -overs=3,sec=ntlmssp,credentials=/root/.user1_smb_creds > # cd /mnt/vm3 > # echo foo > foo > > # for i in {1..100} ; do cat foo >/dev/null 2>&1 & sleep 0.0001 ; kill -9 $! > ; done > > (increase count if necessary--100 appears sufficient to cause multiple leaked > file handles) > > > The client will stop waiting for the response, and will output > the following when the response does arrive: > > CIFS VFS: Close unmatched open > > on the server side, an open file handle is leaked. If using samba, > the following will show these open files: > > > # smbstatus | grep -i Locked -A1000 > Locked files: > Pid Uid DenyMode Access R/W Oplock > SharePath Name Time > -------------------------------------------------------------------------------------------------- > 25936 501 DENY_NONE 0x80 RDONLY NONE > /home/user1 . Tue Nov 12 12:29:24 2019 > 25936 501 DENY_NONE 0x80 RDONLY NONE > /home/user1 . Tue Nov 12 12:29:24 2019 > 25936 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) > /home/user1 foo Tue Nov 12 12:29:24 2019 > 25936 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) > /home/user1 foo Tue Nov 12 12:29:24 2019 > 25936 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) > /home/user1 foo Tue Nov 12 12:29:24 2019 > 25936 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) > /home/user1 foo Tue Nov 12 12:29:24 2019 > 25936 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) > /home/user1 foo Tue Nov 12 12:29:24 2019 > > > (also tracked https://bugzilla.redhat.com/show_bug.cgi?id=1771691) > I don't think it has to do with "CIFS VFS: Close unmatched open", in fact those are probably the times where it did not leak. Unless the SMB2_close() fails with an error. Can you try the patch below and check that these SMB2_close() are successful? I think it is more likely that we leak because we can not match the SMB2_Create() response with a MID. Do you get "No task to wake, unknown frame received" in the log? We need to do two things here. 1, Find out why we can not match this with a MID and second, This it to fix THIS bug. 2, In the demultiplex_thread, where we get a "No task to wake, unknown frame received" we should check is this an Open? if so we should invoke server->ops->handle_cancelled_mid() This is to make sure other similar bugs do not cause leaks like this. regards ronnie sahlberg > Frank > -- > Frank Sorenson > sorenson@redhat.com > Principal Software Maintenance Engineer > Global Support Services - filesystems > Red Hat > > [-- Attachment #2: patch --] [-- Type: text/x-patch, Size: 1575 bytes --] diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h index d78bfcc19156..10cb4d4fb122 100644 --- a/fs/cifs/cifsglob.h +++ b/fs/cifs/cifsglob.h @@ -1548,6 +1548,7 @@ struct mid_q_entry { }; struct close_cancelled_open { + __u64 mid; struct cifs_fid fid; struct cifs_tcon *tcon; struct work_struct work; diff --git a/fs/cifs/smb2misc.c b/fs/cifs/smb2misc.c index e311f58dc1c8..6120c307d88e 100644 --- a/fs/cifs/smb2misc.c +++ b/fs/cifs/smb2misc.c @@ -735,12 +735,19 @@ smb2_cancelled_close_fid(struct work_struct *work) { struct close_cancelled_open *cancelled = container_of(work, struct close_cancelled_open, work); + struct cifs_tcon *tcon = cancelled->tcon; + int rc; - cifs_dbg(VFS, "Close unmatched open\n"); + cifs_tcon_dbg(VFS, "Close unmatched open for MID:%llx\n", + cancelled->mid); - SMB2_close(0, cancelled->tcon, cancelled->fid.persistent_fid, - cancelled->fid.volatile_fid); - cifs_put_tcon(cancelled->tcon); + rc = SMB2_close(0, tcon, cancelled->fid.persistent_fid, + cancelled->fid.volatile_fid); + if (rc) { + cifs_tcon_dbg(VFS, "Close cancelled mid failed with rc:%d\n", rc); + } + + cifs_put_tcon(tcon); kfree(cancelled); } @@ -770,6 +777,7 @@ smb2_handle_cancelled_mid(char *buffer, struct TCP_Server_Info *server) cancelled->fid.persistent_fid = rsp->PersistentFileId; cancelled->fid.volatile_fid = rsp->VolatileFileId; cancelled->tcon = tcon; + cancelled->mid = sync_hdr->MessageId; INIT_WORK(&cancelled->work, smb2_cancelled_close_fid); queue_work(cifsiod_wq, &cancelled->work); ^ permalink raw reply related [flat|nested] 10+ messages in thread
end of thread, other threads:[~2019-11-21 19:41 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-11-12 19:34 A process killed while opening a file can result in leaked open handle on the server Frank Sorenson 2019-11-13 2:37 ` Pavel Shilovsky 2019-11-13 5:19 ` Ronnie Sahlberg 2019-11-13 6:49 ` Ronnie Sahlberg 2019-11-13 22:15 ` Frank Sorenson 2019-11-14 1:39 ` Ronnie Sahlberg 2019-11-15 18:21 ` Pavel Shilovskiy 2019-11-17 16:29 ` Frank Sorenson 2019-11-21 19:41 ` Pavel Shilovsky 2019-11-13 3:28 ` Ronnie Sahlberg
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).