* [PATCH v3] cifs: ignore cached share root handle closing errors
@ 2020-04-06 11:34 Aurelien Aptel
2020-04-06 17:22 ` Pavel Shilovsky
0 siblings, 1 reply; 4+ messages in thread
From: Aurelien Aptel @ 2020-04-06 11:34 UTC (permalink / raw)
To: linux-cifs; +Cc: smfrench, Aurelien Aptel, Steve French
Fix tcon use-after-free and NULL ptr deref.
Customer system crashes with the following kernel log:
[462233.169868] CIFS VFS: Cancelling wait for mid 4894753 cmd: 14 => a QUERY DIR
[462233.228045] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4
[462233.305922] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4
[462233.306205] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4
[462233.347060] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4
[462233.347107] CIFS VFS: Close unmatched open
[462233.347113] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
...
[exception RIP: cifs_put_tcon+0xa0] (this is doing tcon->ses->server)
#6 [...] smb2_cancelled_close_fid at ... [cifs]
#7 [...] process_one_work at ...
#8 [...] worker_thread at ...
#9 [...] kthread at ...
The most likely explanation we have is:
* When we put the last reference of a tcon (refcount=0), we close the
cached share root handle.
* If closing a handle is interrupted, SMB2_close() will
queue a SMB2_close() in a work thread.
* The queued object keeps a tcon ref so we bump the tcon
refcount, jumping from 0 to 1.
* We reach the end of cifs_put_tcon(), we free the tcon object despite
it now having a refcount of 1.
* The queued work now runs, but the tcon, ses & server was freed in
the meantime resulting in a crash.
THREAD 1
========
cifs_put_tcon => tcon refcount reach 0
SMB2_tdis
close_shroot_lease
close_shroot_lease_locked => if cached root has lease && refcount = 0
smb2_close_cached_fid => if cached root valid
SMB2_close => retry close in a thread if interrupted
smb2_handle_cancelled_close
__smb2_handle_cancelled_close => !! tcon refcount bump 0 => 1 !!
INIT_WORK(&cancelled->work, smb2_cancelled_close_fid);
queue_work(cifsiod_wq, &cancelled->work) => queue work
tconInfoFree(tcon); ==> freed!
cifs_put_smb_ses(ses); ==> freed!
THREAD 2 (workqueue)
========
smb2_cancelled_close_fid
SMB2_close(0, cancelled->tcon, ...); => use-after-free of tcon
cifs_put_tcon(cancelled->tcon); => tcon refcount reach 0 second time
*CRASH*
Fixes: d9191319358d ("CIFS: Close cached root handle only if it has a lease")
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
---
changes since v2:
* add WARN_ONCE
* show server hostname & tid
* VFS -> FYI
fs/cifs/smb2misc.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/fs/cifs/smb2misc.c b/fs/cifs/smb2misc.c
index 0511aaf451d4..cfa57e3ad352 100644
--- a/fs/cifs/smb2misc.c
+++ b/fs/cifs/smb2misc.c
@@ -766,6 +766,21 @@ smb2_handle_cancelled_close(struct cifs_tcon *tcon, __u64 persistent_fid,
cifs_dbg(FYI, "%s: tc_count=%d\n", __func__, tcon->tc_count);
spin_lock(&cifs_tcp_ses_lock);
+ if (tcon->tc_count <= 0) {
+ const char *host = "(null)";
+
+ WARN_ONCE(tcon->tc_count < 0, "tcon refcount is negative");
+ spin_unlock(&cifs_tcp_ses_lock);
+
+ if (tcon->ses && tcon->ses->server
+ && tcon->ses->server->hostname)
+ host = tcon->ses->server->hostname;
+
+ cifs_dbg(FYI, "\\\\%s tid=%u: tcon is closing, skipping async close retry of fid %llu %llu\n",
+ host, tcon->tid, persistent_fid, volatile_fid);
+
+ return 0;
+ }
tcon->tc_count++;
spin_unlock(&cifs_tcp_ses_lock);
--
2.16.4
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v3] cifs: ignore cached share root handle closing errors
2020-04-06 11:34 [PATCH v3] cifs: ignore cached share root handle closing errors Aurelien Aptel
@ 2020-04-06 17:22 ` Pavel Shilovsky
2020-04-07 9:49 ` [PATCH v4] " Aurelien Aptel
0 siblings, 1 reply; 4+ messages in thread
From: Pavel Shilovsky @ 2020-04-06 17:22 UTC (permalink / raw)
To: Aurelien Aptel; +Cc: linux-cifs, Steve French, Steve French
пн, 6 апр. 2020 г. в 04:34, Aurelien Aptel <aaptel@suse.com>:
>
> Fix tcon use-after-free and NULL ptr deref.
>
> Customer system crashes with the following kernel log:
>
> [462233.169868] CIFS VFS: Cancelling wait for mid 4894753 cmd: 14 => a QUERY DIR
> [462233.228045] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4
> [462233.305922] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4
> [462233.306205] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4
> [462233.347060] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4
> [462233.347107] CIFS VFS: Close unmatched open
> [462233.347113] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
> ...
> [exception RIP: cifs_put_tcon+0xa0] (this is doing tcon->ses->server)
> #6 [...] smb2_cancelled_close_fid at ... [cifs]
> #7 [...] process_one_work at ...
> #8 [...] worker_thread at ...
> #9 [...] kthread at ...
>
> The most likely explanation we have is:
>
> * When we put the last reference of a tcon (refcount=0), we close the
> cached share root handle.
> * If closing a handle is interrupted, SMB2_close() will
> queue a SMB2_close() in a work thread.
> * The queued object keeps a tcon ref so we bump the tcon
> refcount, jumping from 0 to 1.
> * We reach the end of cifs_put_tcon(), we free the tcon object despite
> it now having a refcount of 1.
> * The queued work now runs, but the tcon, ses & server was freed in
> the meantime resulting in a crash.
>
> THREAD 1
> ========
> cifs_put_tcon => tcon refcount reach 0
> SMB2_tdis
> close_shroot_lease
> close_shroot_lease_locked => if cached root has lease && refcount = 0
> smb2_close_cached_fid => if cached root valid
> SMB2_close => retry close in a thread if interrupted
> smb2_handle_cancelled_close
> __smb2_handle_cancelled_close => !! tcon refcount bump 0 => 1 !!
> INIT_WORK(&cancelled->work, smb2_cancelled_close_fid);
> queue_work(cifsiod_wq, &cancelled->work) => queue work
> tconInfoFree(tcon); ==> freed!
> cifs_put_smb_ses(ses); ==> freed!
>
> THREAD 2 (workqueue)
> ========
> smb2_cancelled_close_fid
> SMB2_close(0, cancelled->tcon, ...); => use-after-free of tcon
> cifs_put_tcon(cancelled->tcon); => tcon refcount reach 0 second time
> *CRASH*
>
> Fixes: d9191319358d ("CIFS: Close cached root handle only if it has a lease")
> Signed-off-by: Aurelien Aptel <aaptel@suse.com>
> Signed-off-by: Steve French <stfrench@microsoft.com>
> ---
> changes since v2:
> * add WARN_ONCE
> * show server hostname & tid
> * VFS -> FYI
>
> fs/cifs/smb2misc.c | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
> diff --git a/fs/cifs/smb2misc.c b/fs/cifs/smb2misc.c
> index 0511aaf451d4..cfa57e3ad352 100644
> --- a/fs/cifs/smb2misc.c
> +++ b/fs/cifs/smb2misc.c
> @@ -766,6 +766,21 @@ smb2_handle_cancelled_close(struct cifs_tcon *tcon, __u64 persistent_fid,
>
> cifs_dbg(FYI, "%s: tc_count=%d\n", __func__, tcon->tc_count);
> spin_lock(&cifs_tcp_ses_lock);
> + if (tcon->tc_count <= 0) {
> + const char *host = "(null)";
> +
> + WARN_ONCE(tcon->tc_count < 0, "tcon refcount is negative");
> + spin_unlock(&cifs_tcp_ses_lock);
> +
> + if (tcon->ses && tcon->ses->server
> + && tcon->ses->server->hostname)
> + host = tcon->ses->server->hostname;
> +
> + cifs_dbg(FYI, "\\\\%s tid=%u: tcon is closing, skipping async close retry of fid %llu %llu\n",
> + host, tcon->tid, persistent_fid, volatile_fid);
You can use cifs_server_dbg() directly - it is adding prefix with
server name automatically, you just need a server variable in the
calling function.
> +
> + return 0;
> + }
> tcon->tc_count++;
> spin_unlock(&cifs_tcp_ses_lock);
>
> --
> 2.16.4
>
--
Best regards,
Pavel Shilovsky
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v4] cifs: ignore cached share root handle closing errors
2020-04-06 17:22 ` Pavel Shilovsky
@ 2020-04-07 9:49 ` Aurelien Aptel
2020-04-07 16:17 ` Pavel Shilovsky
0 siblings, 1 reply; 4+ messages in thread
From: Aurelien Aptel @ 2020-04-07 9:49 UTC (permalink / raw)
To: linux-cifs; +Cc: smfrench, Aurelien Aptel, Steve French
Fix tcon use-after-free and NULL ptr deref.
Customer system crashes with the following kernel log:
[462233.169868] CIFS VFS: Cancelling wait for mid 4894753 cmd: 14 => a QUERY DIR
[462233.228045] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4
[462233.305922] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4
[462233.306205] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4
[462233.347060] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4
[462233.347107] CIFS VFS: Close unmatched open
[462233.347113] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
...
[exception RIP: cifs_put_tcon+0xa0] (this is doing tcon->ses->server)
#6 [...] smb2_cancelled_close_fid at ... [cifs]
#7 [...] process_one_work at ...
#8 [...] worker_thread at ...
#9 [...] kthread at ...
The most likely explanation we have is:
* When we put the last reference of a tcon (refcount=0), we close the
cached share root handle.
* If closing a handle is interrupted, SMB2_close() will
queue a SMB2_close() in a work thread.
* The queued object keeps a tcon ref so we bump the tcon
refcount, jumping from 0 to 1.
* We reach the end of cifs_put_tcon(), we free the tcon object despite
it now having a refcount of 1.
* The queued work now runs, but the tcon, ses & server was freed in
the meantime resulting in a crash.
THREAD 1
========
cifs_put_tcon => tcon refcount reach 0
SMB2_tdis
close_shroot_lease
close_shroot_lease_locked => if cached root has lease && refcount = 0
smb2_close_cached_fid => if cached root valid
SMB2_close => retry close in a thread if interrupted
smb2_handle_cancelled_close
__smb2_handle_cancelled_close => !! tcon refcount bump 0 => 1 !!
INIT_WORK(&cancelled->work, smb2_cancelled_close_fid);
queue_work(cifsiod_wq, &cancelled->work) => queue work
tconInfoFree(tcon); ==> freed!
cifs_put_smb_ses(ses); ==> freed!
THREAD 2 (workqueue)
========
smb2_cancelled_close_fid
SMB2_close(0, cancelled->tcon, ...); => use-after-free of tcon
cifs_put_tcon(cancelled->tcon); => tcon refcount reach 0 second time
*CRASH*
Fixes: d9191319358d ("CIFS: Close cached root handle only if it has a lease")
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
---
fs/cifs/smb2misc.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/fs/cifs/smb2misc.c b/fs/cifs/smb2misc.c
index 0511aaf451d4..497afb0b9960 100644
--- a/fs/cifs/smb2misc.c
+++ b/fs/cifs/smb2misc.c
@@ -766,6 +766,20 @@ smb2_handle_cancelled_close(struct cifs_tcon *tcon, __u64 persistent_fid,
cifs_dbg(FYI, "%s: tc_count=%d\n", __func__, tcon->tc_count);
spin_lock(&cifs_tcp_ses_lock);
+ if (tcon->tc_count <= 0) {
+ struct TCP_Server_Info *server = NULL;
+
+ WARN_ONCE(tcon->tc_count < 0, "tcon refcount is negative");
+ spin_unlock(&cifs_tcp_ses_lock);
+
+ if (tcon->ses)
+ server = tcon->ses->server;
+
+ cifs_server_dbg(FYI, "tid=%u: tcon is closing, skipping async close retry of fid %llu %llu\n",
+ tcon->tid, persistent_fid, volatile_fid);
+
+ return 0;
+ }
tcon->tc_count++;
spin_unlock(&cifs_tcp_ses_lock);
--
2.16.4
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v4] cifs: ignore cached share root handle closing errors
2020-04-07 9:49 ` [PATCH v4] " Aurelien Aptel
@ 2020-04-07 16:17 ` Pavel Shilovsky
0 siblings, 0 replies; 4+ messages in thread
From: Pavel Shilovsky @ 2020-04-07 16:17 UTC (permalink / raw)
To: Aurelien Aptel; +Cc: linux-cifs, Steve French, Steve French
вт, 7 апр. 2020 г. в 02:50, Aurelien Aptel <aaptel@suse.com>:
>
> Fix tcon use-after-free and NULL ptr deref.
>
> Customer system crashes with the following kernel log:
>
> [462233.169868] CIFS VFS: Cancelling wait for mid 4894753 cmd: 14 => a QUERY DIR
> [462233.228045] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4
> [462233.305922] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4
> [462233.306205] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4
> [462233.347060] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4
> [462233.347107] CIFS VFS: Close unmatched open
> [462233.347113] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
> ...
> [exception RIP: cifs_put_tcon+0xa0] (this is doing tcon->ses->server)
> #6 [...] smb2_cancelled_close_fid at ... [cifs]
> #7 [...] process_one_work at ...
> #8 [...] worker_thread at ...
> #9 [...] kthread at ...
>
> The most likely explanation we have is:
>
> * When we put the last reference of a tcon (refcount=0), we close the
> cached share root handle.
> * If closing a handle is interrupted, SMB2_close() will
> queue a SMB2_close() in a work thread.
> * The queued object keeps a tcon ref so we bump the tcon
> refcount, jumping from 0 to 1.
> * We reach the end of cifs_put_tcon(), we free the tcon object despite
> it now having a refcount of 1.
> * The queued work now runs, but the tcon, ses & server was freed in
> the meantime resulting in a crash.
>
> THREAD 1
> ========
> cifs_put_tcon => tcon refcount reach 0
> SMB2_tdis
> close_shroot_lease
> close_shroot_lease_locked => if cached root has lease && refcount = 0
> smb2_close_cached_fid => if cached root valid
> SMB2_close => retry close in a thread if interrupted
> smb2_handle_cancelled_close
> __smb2_handle_cancelled_close => !! tcon refcount bump 0 => 1 !!
> INIT_WORK(&cancelled->work, smb2_cancelled_close_fid);
> queue_work(cifsiod_wq, &cancelled->work) => queue work
> tconInfoFree(tcon); ==> freed!
> cifs_put_smb_ses(ses); ==> freed!
>
> THREAD 2 (workqueue)
> ========
> smb2_cancelled_close_fid
> SMB2_close(0, cancelled->tcon, ...); => use-after-free of tcon
> cifs_put_tcon(cancelled->tcon); => tcon refcount reach 0 second time
> *CRASH*
>
> Fixes: d9191319358d ("CIFS: Close cached root handle only if it has a lease")
> Signed-off-by: Aurelien Aptel <aaptel@suse.com>
> Signed-off-by: Steve French <stfrench@microsoft.com>
> ---
> fs/cifs/smb2misc.c | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/fs/cifs/smb2misc.c b/fs/cifs/smb2misc.c
> index 0511aaf451d4..497afb0b9960 100644
> --- a/fs/cifs/smb2misc.c
> +++ b/fs/cifs/smb2misc.c
> @@ -766,6 +766,20 @@ smb2_handle_cancelled_close(struct cifs_tcon *tcon, __u64 persistent_fid,
>
> cifs_dbg(FYI, "%s: tc_count=%d\n", __func__, tcon->tc_count);
> spin_lock(&cifs_tcp_ses_lock);
> + if (tcon->tc_count <= 0) {
> + struct TCP_Server_Info *server = NULL;
> +
> + WARN_ONCE(tcon->tc_count < 0, "tcon refcount is negative");
> + spin_unlock(&cifs_tcp_ses_lock);
> +
> + if (tcon->ses)
> + server = tcon->ses->server;
> +
> + cifs_server_dbg(FYI, "tid=%u: tcon is closing, skipping async close retry of fid %llu %llu\n",
> + tcon->tid, persistent_fid, volatile_fid);
> +
> + return 0;
> + }
> tcon->tc_count++;
> spin_unlock(&cifs_tcp_ses_lock);
>
> --
> 2.16.4
>
Looks good, thanks!
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
--
Best regards,
Pavel Shilovsky
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-04-07 16:17 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-06 11:34 [PATCH v3] cifs: ignore cached share root handle closing errors Aurelien Aptel
2020-04-06 17:22 ` Pavel Shilovsky
2020-04-07 9:49 ` [PATCH v4] " Aurelien Aptel
2020-04-07 16:17 ` Pavel Shilovsky
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.