All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Wysochanski <dwysocha@redhat.com>
To: Pavel Shilovskiy <pshilov@microsoft.com>,
	Ronnie Sahlberg <lsahlber@redhat.com>
Cc: linux-cifs@vger.kernel.org, Frank Sorenson <sorenson@redhat.com>
Subject: [RFC PATCH v2] cifs: Fix list_del corruption of retry_list in cifs_reconnect
Date: Sat, 19 Oct 2019 19:35:05 -0400	[thread overview]
Message-ID: <20191019233505.14191-1-dwysocha@redhat.com> (raw)
In-Reply-To: <DM5PR21MB0185857761946DBE12AEB3ADB66C0@DM5PR21MB0185.namprd21.prod.outlook.com>

This is a second attempt at the fix for the list_del corruption
issue.  This patch adds a similar refcount approach in 
clean_demultiplex_info() since that was noticed.  However, for
some reason I now get the list_del corruption come back fairly
quickly (after a few minutes) and eventually a softlockup.
I have not tracked down the problem yet.


There's a race between the demultiplexer thread and the request
issuing thread similar to the race described in
commit 696e420bb2a6 ("cifs: Fix use after free of a mid_q_entry")
where both threads may obtain and attempt to call list_del_init
on the same mid and a list_del corruption similar to the
following will result:

[  430.454897] list_del corruption. prev->next should be ffff98d3a8f316c0, but was 2e885cb266355469
[  430.464668] ------------[ cut here ]------------
[  430.466569] kernel BUG at lib/list_debug.c:51!
[  430.468476] invalid opcode: 0000 [#1] SMP PTI
[  430.470286] CPU: 0 PID: 13267 Comm: cifsd Kdump: loaded Not tainted 5.4.0-rc3+ #19
[  430.473472] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  430.475872] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
[  430.478129] Code: 5e 15 8e e8 54 a3 c5 ff 0f 0b 48 c7 c7 78 5f 15 8e e8 46 a3 c5 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 38 5f 15 8e e8 32 a3 c5 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 00 5f 15 8e e8 1e a3 c5 ff 0f 0b
[  430.485563] RSP: 0018:ffffb4db0042fd38 EFLAGS: 00010246
[  430.487665] RAX: 0000000000000054 RBX: ffff98d3aabb8800 RCX: 0000000000000000
[  430.490513] RDX: 0000000000000000 RSI: ffff98d3b7a17908 RDI: ffff98d3b7a17908
[  430.493383] RBP: ffff98d3a8f316c0 R08: ffff98d3b7a17908 R09: 0000000000000285
[  430.496258] R10: ffffb4db0042fbf0 R11: ffffb4db0042fbf5 R12: ffff98d3aabb89c0
[  430.499113] R13: ffffb4db0042fd48 R14: 2e885cb266355469 R15: ffff98d3b24c4480
[  430.501981] FS:  0000000000000000(0000) GS:ffff98d3b7a00000(0000) knlGS:0000000000000000
[  430.505232] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  430.507546] CR2: 00007f08cd17b9c0 CR3: 000000023484a000 CR4: 00000000000406f0
[  430.510426] Call Trace:
[  430.511500]  cifs_reconnect+0x25e/0x610 [cifs]
[  430.513350]  cifs_readv_from_socket+0x220/0x250 [cifs]
[  430.515464]  cifs_read_from_socket+0x4a/0x70 [cifs]
[  430.517452]  ? try_to_wake_up+0x212/0x650
[  430.519122]  ? cifs_small_buf_get+0x16/0x30 [cifs]
[  430.521086]  ? allocate_buffers+0x66/0x120 [cifs]
[  430.523019]  cifs_demultiplex_thread+0xdc/0xc30 [cifs]
[  430.525116]  kthread+0xfb/0x130
[  430.526421]  ? cifs_handle_standard+0x190/0x190 [cifs]
[  430.528514]  ? kthread_park+0x90/0x90
[  430.530019]  ret_from_fork+0x35/0x40

To fix the above, inside cifs_reconnect unconditionally set the
state to MID_RETRY_NEEDED, and then take a reference before we
move any mid_q_entry on server->pending_mid_q to the temporary
retry_list.  Then while processing retry_list make sure we check
the state is still MID_RETRY_NEEDED while holding GlobalMid_Lock
before calling list_del_init.  Then after mid_q_entry callback
has been completed, drop the reference.  In the code paths for
request issuing thread, avoid calling list_del_init if we
notice mid->mid_state != MID_RETRY_NEEDED, avoiding the
race and duplicate call to list_del_init.  In addition to
the above MID_RETRY_NEEDED case, handle the MID_SHUTDOWN case
in a similar fashion to avoid the possibility of a similar
crash.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
---
 fs/cifs/connect.c   | 30 ++++++++++++++++++++++--------
 fs/cifs/transport.c |  8 ++++++--
 2 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index a64dfa95a925..0327bace214d 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -564,8 +564,9 @@ cifs_reconnect(struct TCP_Server_Info *server)
 	spin_lock(&GlobalMid_Lock);
 	list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
 		mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
-		if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
-			mid_entry->mid_state = MID_RETRY_NEEDED;
+		kref_get(&mid_entry->refcount);
+		WARN_ON(mid_entry->mid_state != MID_REQUEST_SUBMITTED);
+		mid_entry->mid_state = MID_RETRY_NEEDED;
 		list_move(&mid_entry->qhead, &retry_list);
 	}
 	spin_unlock(&GlobalMid_Lock);
@@ -574,8 +575,12 @@ cifs_reconnect(struct TCP_Server_Info *server)
 	cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__);
 	list_for_each_safe(tmp, tmp2, &retry_list) {
 		mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
-		list_del_init(&mid_entry->qhead);
+		spin_lock(&GlobalMid_Lock);
+		if (mid_entry->mid_state == MID_RETRY_NEEDED)
+			list_del_init(&mid_entry->qhead);
+		spin_unlock(&GlobalMid_Lock);
 		mid_entry->callback(mid_entry);
+		cifs_mid_q_entry_release(mid_entry);
 	}
 
 	if (cifs_rdma_enabled(server)) {
@@ -884,10 +889,6 @@ dequeue_mid(struct mid_q_entry *mid, bool malformed)
 	mid->when_received = jiffies;
 #endif
 	spin_lock(&GlobalMid_Lock);
-	if (!malformed)
-		mid->mid_state = MID_RESPONSE_RECEIVED;
-	else
-		mid->mid_state = MID_RESPONSE_MALFORMED;
 	/*
 	 * Trying to handle/dequeue a mid after the send_recv()
 	 * function has finished processing it is a bug.
@@ -895,8 +896,15 @@ dequeue_mid(struct mid_q_entry *mid, bool malformed)
 	if (mid->mid_flags & MID_DELETED)
 		printk_once(KERN_WARNING
 			    "trying to dequeue a deleted mid\n");
-	else
+	if (mid->mid_state != MID_RETRY_NEEDED &&
+	    mid->mid_state != MID_SHUTDOWN)
 		list_del_init(&mid->qhead);
+
+	if (!malformed)
+		mid->mid_state = MID_RESPONSE_RECEIVED;
+	else
+		mid->mid_state = MID_RESPONSE_MALFORMED;
+
 	spin_unlock(&GlobalMid_Lock);
 }
 
@@ -966,6 +974,7 @@ static void clean_demultiplex_info(struct TCP_Server_Info *server)
 		list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
 			mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
 			cifs_dbg(FYI, "Clearing mid 0x%llx\n", mid_entry->mid);
+			kref_get(&mid_entry->refcount);
 			mid_entry->mid_state = MID_SHUTDOWN;
 			list_move(&mid_entry->qhead, &dispose_list);
 		}
@@ -975,8 +984,13 @@ static void clean_demultiplex_info(struct TCP_Server_Info *server)
 		list_for_each_safe(tmp, tmp2, &dispose_list) {
 			mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
 			cifs_dbg(FYI, "Callback mid 0x%llx\n", mid_entry->mid);
+			spin_lock(&GlobalMid_Lock);
+			if (mid_entry->mid_state == MID_SHUTDOWN)
+				list_del_init(&mid_entry->qhead);
+			spin_unlock(&GlobalMid_Lock);
 			list_del_init(&mid_entry->qhead);
 			mid_entry->callback(mid_entry);
+			cifs_mid_q_entry_release(mid_entry);
 		}
 		/* 1/8th of sec is more than enough time for them to exit */
 		msleep(125);
diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
index 308ad0f495e1..d1794bd664ae 100644
--- a/fs/cifs/transport.c
+++ b/fs/cifs/transport.c
@@ -173,7 +173,9 @@ void
 cifs_delete_mid(struct mid_q_entry *mid)
 {
 	spin_lock(&GlobalMid_Lock);
-	list_del_init(&mid->qhead);
+	if (mid->mid_state != MID_RETRY_NEEDED &&
+	    mid->mid_state != MID_SHUTDOWN)
+		list_del_init(&mid->qhead);
 	mid->mid_flags |= MID_DELETED;
 	spin_unlock(&GlobalMid_Lock);
 
@@ -872,7 +874,9 @@ cifs_sync_mid_result(struct mid_q_entry *mid, struct TCP_Server_Info *server)
 		rc = -EHOSTDOWN;
 		break;
 	default:
-		list_del_init(&mid->qhead);
+		if (mid->mid_state != MID_RETRY_NEEDED &&
+		    mid->mid_state != MID_SHUTDOWN)
+			list_del_init(&mid->qhead);
 		cifs_server_dbg(VFS, "%s: invalid mid state mid=%llu state=%d\n",
 			 __func__, mid->mid, mid->mid_state);
 		rc = -EIO;
-- 
2.21.0


  parent reply	other threads:[~2019-10-19 23:35 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-16 19:27 list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3 David Wysochanski
2019-10-17  0:17 ` Ronnie Sahlberg
2019-10-17  9:05   ` Ronnie Sahlberg
2019-10-17 11:42     ` David Wysochanski
2019-10-17 14:08       ` Ronnie Sahlberg
2019-10-17 15:29         ` David Wysochanski
2019-10-17 18:29           ` Pavel Shilovskiy
2019-10-17 19:23             ` David Wysochanski
2019-10-17 19:58               ` Pavel Shilovskiy
2019-10-17 20:34                 ` David Wysochanski
2019-10-17 21:44                   ` Ronnie Sahlberg
2019-10-17 22:02                     ` Pavel Shilovskiy
2019-10-17 22:53                       ` Ronnie Sahlberg
2019-10-17 23:20                         ` Pavel Shilovskiy
2019-10-17 23:41                           ` Ronnie Sahlberg
2019-10-18  8:16                         ` David Wysochanski
2019-10-18  9:27                           ` Ronnie Sahlberg
2019-10-18 10:12                             ` David Wysochanski
2019-10-18 20:59                               ` Pavel Shilovskiy
2019-10-18 21:21                                 ` David Wysochanski
2019-10-18 21:44                                   ` David Wysochanski
2019-10-18 22:45                                     ` Pavel Shilovskiy
2019-10-19 11:09                                       ` David Wysochanski
2019-10-21 21:54                                         ` Pavel Shilovsky
2019-10-22 18:39                                           ` David Wysochanski
2019-10-22 21:20                                             ` ronnie sahlberg
2019-10-22 21:25                                               ` Pavel Shilovsky
2019-10-22 21:32                                                 ` ronnie sahlberg
2019-10-19 23:35                                 ` Dave Wysochanski [this message]
2019-10-21 22:34                                   ` [RFC PATCH v2] cifs: Fix list_del corruption of retry_list in cifs_reconnect Pavel Shilovsky
2019-10-19  9:44                           ` list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3 Ronnie Sahlberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191019233505.14191-1-dwysocha@redhat.com \
    --to=dwysocha@redhat.com \
    --cc=linux-cifs@vger.kernel.org \
    --cc=lsahlber@redhat.com \
    --cc=pshilov@microsoft.com \
    --cc=sorenson@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.