All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, David Howells <dhowells@redhat.com>,
	Jeremy McNicoll <jeremymc@redhat.com>,
	Frank Sorenson <sorenson@redhat.com>,
	Benjamin Coddington <bcodding@redhat.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Sasha Levin <alexander.levin@verizon.com>
Subject: [PATCH 4.4 22/46] fscache: Fix dead object requeue
Date: Thu, 15 Jun 2017 19:52:41 +0200	[thread overview]
Message-ID: <20170615175219.332063508@linuxfoundation.org> (raw)
In-Reply-To: <20170615175218.286057711@linuxfoundation.org>

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: David Howells <dhowells@redhat.com>


[ Upstream commit e26bfebdfc0d212d366de9990a096665d5c0209a ]

Under some circumstances, an fscache object can become queued such that it
fscache_object_work_func() can be called once the object is in the
OBJECT_DEAD state.  This results in the kernel oopsing when it tries to
invoke the handler for the state (which is hard coded to 0x2).

The way this comes about is something like the following:

 (1) The object dispatcher is processing a work state for an object.  This
     is done in workqueue context.

 (2) An out-of-band event comes in that isn't masked, causing the object to
     be queued, say EV_KILL.

 (3) The object dispatcher finishes processing the current work state on
     that object and then sees there's another event to process, so,
     without returning to the workqueue core, it processes that event too.
     It then follows the chain of events that initiates until we reach
     OBJECT_DEAD without going through a wait state (such as
     WAIT_FOR_CLEARANCE).

     At this point, object->events may be 0, object->event_mask will be 0
     and oob_event_mask will be 0.

 (4) The object dispatcher returns to the workqueue processor, and in due
     course, this sees that the object's work item is still queued and
     invokes it again.

 (5) The current state is a work state (OBJECT_DEAD), so the dispatcher
     jumps to it - resulting in an OOPS.

When I'm seeing this, the work state in (1) appears to have been either
LOOK_UP_OBJECT or CREATE_OBJECT (object->oob_table is
fscache_osm_lookup_oob).

The window for (2) is very small:

 (A) object->event_mask is cleared whilst the event dispatch process is
     underway - though there's no memory barrier to force this to the top
     of the function.

     The window, therefore is from the time the object was selected by the
     workqueue processor and made requeueable to the time the mask was
     cleared.

 (B) fscache_raise_event() will only queue the object if it manages to set
     the event bit and the corresponding event_mask bit was set.

     The enqueuement is then deferred slightly whilst we get a ref on the
     object and get the per-CPU variable for workqueue congestion.  This
     slight deferral slightly increases the probability by allowing extra
     time for the workqueue to make the item requeueable.

Handle this by giving the dead state a processor function and checking the
for the dead state address rather than seeing if the processor function is
address 0x2.  The dead state processor function can then set a flag to
indicate that it's occurred and give a warning if it occurs more than once
per object.

If this race occurs, an oops similar to the following is seen (note the RIP
value):

BUG: unable to handle kernel NULL pointer dereference at 0000000000000002
IP: [<0000000000000002>] 0x1
PGD 0
Oops: 0010 [#1] SMP
Modules linked in: ...
CPU: 17 PID: 16077 Comm: kworker/u48:9 Not tainted 3.10.0-327.18.2.el7.x86_64 #1
Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 12/27/2015
Workqueue: fscache_object fscache_object_work_func [fscache]
task: ffff880302b63980 ti: ffff880717544000 task.ti: ffff880717544000
RIP: 0010:[<0000000000000002>]  [<0000000000000002>] 0x1
RSP: 0018:ffff880717547df8  EFLAGS: 00010202
RAX: ffffffffa0368640 RBX: ffff880edf7a4480 RCX: dead000000200200
RDX: 0000000000000002 RSI: 00000000ffffffff RDI: ffff880edf7a4480
RBP: ffff880717547e18 R08: 0000000000000000 R09: dfc40a25cb3a4510
R10: dfc40a25cb3a4510 R11: 0000000000000400 R12: 0000000000000000
R13: ffff880edf7a4510 R14: ffff8817f6153400 R15: 0000000000000600
FS:  0000000000000000(0000) GS:ffff88181f420000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000002 CR3: 000000000194a000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 ffffffffa0363695 ffff880edf7a4510 ffff88093f16f900 ffff8817faa4ec00
 ffff880717547e60 ffffffff8109d5db 00000000faa4ec18 0000000000000000
 ffff8817faa4ec18 ffff88093f16f930 ffff880302b63980 ffff88093f16f900
Call Trace:
 [<ffffffffa0363695>] ? fscache_object_work_func+0xa5/0x200 [fscache]
 [<ffffffff8109d5db>] process_one_work+0x17b/0x470
 [<ffffffff8109e4ac>] worker_thread+0x21c/0x400
 [<ffffffff8109e290>] ? rescuer_thread+0x400/0x400
 [<ffffffff810a5acf>] kthread+0xcf/0xe0
 [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
 [<ffffffff816460d8>] ret_from_fork+0x58/0x90
 [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jeremy McNicoll <jeremymc@redhat.com>
Tested-by: Frank Sorenson <sorenson@redhat.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/fscache/object.c           |   26 ++++++++++++++++++++++++--
 include/linux/fscache-cache.h |    1 +
 2 files changed, 25 insertions(+), 2 deletions(-)

--- a/fs/fscache/object.c
+++ b/fs/fscache/object.c
@@ -30,6 +30,7 @@ static const struct fscache_state *fscac
 static const struct fscache_state *fscache_object_available(struct fscache_object *, int);
 static const struct fscache_state *fscache_parent_ready(struct fscache_object *, int);
 static const struct fscache_state *fscache_update_object(struct fscache_object *, int);
+static const struct fscache_state *fscache_object_dead(struct fscache_object *, int);
 
 #define __STATE_NAME(n) fscache_osm_##n
 #define STATE(n) (&__STATE_NAME(n))
@@ -91,7 +92,7 @@ static WORK_STATE(LOOKUP_FAILURE,	"LCFL"
 static WORK_STATE(KILL_OBJECT,		"KILL", fscache_kill_object);
 static WORK_STATE(KILL_DEPENDENTS,	"KDEP", fscache_kill_dependents);
 static WORK_STATE(DROP_OBJECT,		"DROP", fscache_drop_object);
-static WORK_STATE(OBJECT_DEAD,		"DEAD", (void*)2UL);
+static WORK_STATE(OBJECT_DEAD,		"DEAD", fscache_object_dead);
 
 static WAIT_STATE(WAIT_FOR_INIT,	"?INI",
 		  TRANSIT_TO(INIT_OBJECT,	1 << FSCACHE_OBJECT_EV_NEW_CHILD));
@@ -229,6 +230,10 @@ execute_work_state:
 	event = -1;
 	if (new_state == NO_TRANSIT) {
 		_debug("{OBJ%x} %s notrans", object->debug_id, state->name);
+		if (unlikely(state == STATE(OBJECT_DEAD))) {
+			_leave(" [dead]");
+			return;
+		}
 		fscache_enqueue_object(object);
 		event_mask = object->oob_event_mask;
 		goto unmask_events;
@@ -239,7 +244,7 @@ execute_work_state:
 	object->state = state = new_state;
 
 	if (state->work) {
-		if (unlikely(state->work == ((void *)2UL))) {
+		if (unlikely(state == STATE(OBJECT_DEAD))) {
 			_leave(" [dead]");
 			return;
 		}
@@ -1077,3 +1082,20 @@ void fscache_object_mark_killed(struct f
 	}
 }
 EXPORT_SYMBOL(fscache_object_mark_killed);
+
+/*
+ * The object is dead.  We can get here if an object gets queued by an event
+ * that would lead to its death (such as EV_KILL) when the dispatcher is
+ * already running (and so can be requeued) but hasn't yet cleared the event
+ * mask.
+ */
+static const struct fscache_state *fscache_object_dead(struct fscache_object *object,
+						       int event)
+{
+	if (!test_and_set_bit(FSCACHE_OBJECT_RUN_AFTER_DEAD,
+			      &object->flags))
+		return NO_TRANSIT;
+
+	WARN(true, "FS-Cache object redispatched after death");
+	return NO_TRANSIT;
+}
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -360,6 +360,7 @@ struct fscache_object {
 #define FSCACHE_OBJECT_IS_AVAILABLE	5	/* T if object has become active */
 #define FSCACHE_OBJECT_RETIRED		6	/* T if object was retired on relinquishment */
 #define FSCACHE_OBJECT_KILLED_BY_CACHE	7	/* T if object was killed by the cache */
+#define FSCACHE_OBJECT_RUN_AFTER_DEAD	8	/* T if object has been dispatched after death */
 
 	struct list_head	cache_link;	/* link in cache->object_list */
 	struct hlist_node	cookie_link;	/* link in cookie->backing_objects */

  parent reply	other threads:[~2017-06-15 17:56 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-15 17:52 [PATCH 4.4 00/46] 4.4.73-stable review Greg Kroah-Hartman
2017-06-15 17:52 ` [PATCH 4.4 01/46] s390/vmem: fix identity mapping Greg Kroah-Hartman
2017-06-15 17:52 ` [PATCH 4.4 02/46] partitions/msdos: FreeBSD UFS2 file systems are not recognized Greg Kroah-Hartman
2017-06-15 17:52 ` [PATCH 4.4 03/46] ARM: dts: imx6dl: Fix the VDD_ARM_CAP voltage for 396MHz operation Greg Kroah-Hartman
2017-06-15 17:52 ` [PATCH 4.4 04/46] staging: rtl8192e: rtl92e_fill_tx_desc fix write to mapped out memory Greg Kroah-Hartman
2017-06-15 17:52 ` [PATCH 4.4 05/46] Call echo service immediately after socket reconnect Greg Kroah-Hartman
2017-06-15 17:52 ` [PATCH 4.4 06/46] net: xilinx_emaclite: fix freezes due to unordered I/O Greg Kroah-Hartman
2017-06-15 17:52 ` [PATCH 4.4 07/46] net: xilinx_emaclite: fix receive buffer overflow Greg Kroah-Hartman
2017-06-15 17:52 ` [PATCH 4.4 08/46] ipv6: Handle IPv4-mapped src to in6addr_any dst Greg Kroah-Hartman
2017-06-15 17:52 ` [PATCH 4.4 09/46] ipv6: Inhibit IPv4-mapped src address on the wire Greg Kroah-Hartman
2017-06-29 12:13   ` Ben Hutchings
2017-06-29 12:35     ` Greg Kroah-Hartman
2017-06-15 17:52 ` [PATCH 4.4 10/46] NET: Fix /proc/net/arp for AX.25 Greg Kroah-Hartman
2017-06-15 17:52 ` [PATCH 4.4 11/46] NET: mkiss: Fix panic Greg Kroah-Hartman
2017-06-15 17:52 ` [PATCH 4.4 12/46] net: hns: Fix the device being used for dma mapping during TX Greg Kroah-Hartman
2017-06-15 17:52 ` [PATCH 4.4 15/46] i2c: piix4: Fix request_region size Greg Kroah-Hartman
2017-06-15 17:52 ` [PATCH 4.4 17/46] PM / runtime: Avoid false-positive warnings from might_sleep_if() Greg Kroah-Hartman
2017-06-15 17:52 ` [PATCH 4.4 18/46] jump label: pass kbuild_cflags when checking for asm goto support Greg Kroah-Hartman
2017-06-15 17:52 ` [PATCH 4.4 19/46] kasan: respect /proc/sys/kernel/traceoff_on_warning Greg Kroah-Hartman
2017-06-15 17:52 ` [PATCH 4.4 20/46] log2: make order_base_2() behave correctly on const input value zero Greg Kroah-Hartman
2017-06-15 17:52 ` [PATCH 4.4 21/46] ethtool: do not vzalloc(0) on registers dump Greg Kroah-Hartman
2017-06-15 17:52 ` Greg Kroah-Hartman [this message]
2017-06-15 17:52 ` [PATCH 4.4 23/46] fscache: Clear outstanding writes when disabling a cookie Greg Kroah-Hartman
2017-06-15 17:52 ` [PATCH 4.4 24/46] FS-Cache: Initialise stores_lock in netfs cookie Greg Kroah-Hartman
2017-06-15 17:52 ` [PATCH 4.4 25/46] ipv6: fix flow labels when the traffic class is non-0 Greg Kroah-Hartman
2017-06-15 17:52 ` [PATCH 4.4 26/46] drm/nouveau: prevent userspace from deleting client object Greg Kroah-Hartman
2017-06-15 17:52 ` [PATCH 4.4 27/46] drm/nouveau/fence/g84-: protect against concurrent access to semaphore buffers Greg Kroah-Hartman
2017-06-15 17:52 ` [PATCH 4.4 28/46] net/mlx4_core: Avoid command timeouts during VF driver device shutdown Greg Kroah-Hartman
2017-06-15 17:52 ` [PATCH 4.4 29/46] gianfar: synchronize DMA API usage by free_skb_rx_queue w/ gfar_new_page Greg Kroah-Hartman
2017-06-15 17:52 ` [PATCH 4.4 30/46] pinctrl: berlin-bg4ct: fix the value for "sd1a" of pin SCRD0_CRD_PRES Greg Kroah-Hartman
2017-06-15 17:52 ` [PATCH 4.4 31/46] net: adaptec: starfire: add checks for dma mapping errors Greg Kroah-Hartman
2017-06-15 17:52 ` [PATCH 4.4 32/46] parisc, parport_gsc: Fixes for printk continuation lines Greg Kroah-Hartman
2017-06-15 17:52 ` [PATCH 4.4 33/46] drm/nouveau: Dont enabling polling twice on runtime resume Greg Kroah-Hartman
2017-06-15 17:52 ` [PATCH 4.4 34/46] drm/ast: Fixed system hanged if disable P2A Greg Kroah-Hartman
2017-06-29 13:45   ` Ben Hutchings
2017-07-03  7:27     ` Greg Kroah-Hartman
2017-06-15 17:52 ` [PATCH 4.4 35/46] ravb: unmap descriptors when freeing rings Greg Kroah-Hartman
2017-06-29 13:58   ` Ben Hutchings
2017-07-03 12:52     ` Greg Kroah-Hartman
2017-06-15 17:52 ` [PATCH 4.4 36/46] nfs: Fix "Dont increment lock sequence ID after NFS4ERR_MOVED" Greg Kroah-Hartman
2017-06-15 17:52 ` [PATCH 4.4 37/46] r8152: re-schedule napi for tx Greg Kroah-Hartman
2017-06-29 14:23   ` Ben Hutchings
2017-06-15 17:52 ` [PATCH 4.4 38/46] r8152: fix rtl8152_post_reset function Greg Kroah-Hartman
2017-06-15 17:52 ` [PATCH 4.4 39/46] r8152: avoid start_xmit to schedule napi when napi is disabled Greg Kroah-Hartman
2017-06-29 14:35   ` Ben Hutchings
2017-06-15 17:52 ` [PATCH 4.4 40/46] sctp: sctp_addr_id2transport should verify the addr before looking up assoc Greg Kroah-Hartman
2017-06-15 17:53 ` [PATCH 4.4 41/46] romfs: use different way to generate fsid for BLOCK or MTD Greg Kroah-Hartman
2017-06-15 17:53 ` [PATCH 4.4 42/46] proc: add a schedule point in proc_pid_readdir() Greg Kroah-Hartman
2017-06-15 17:53 ` [PATCH 4.4 43/46] tipc: ignore requests when the connection state is not CONNECTED Greg Kroah-Hartman
2017-06-15 17:53 ` [PATCH 4.4 44/46] xtensa: dont use linux IRQ #0 Greg Kroah-Hartman
2017-06-15 17:53 ` [PATCH 4.4 45/46] s390/kvm: do not rely on the ILC on kvm host protection fauls Greg Kroah-Hartman
2017-06-15 17:53 ` [PATCH 4.4 46/46] sparc64: make string buffers large enough Greg Kroah-Hartman
2017-06-15 22:24 ` [PATCH 4.4 00/46] 4.4.73-stable review Shuah Khan
2017-06-16  0:39 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170615175219.332063508@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=alexander.levin@verizon.com \
    --cc=bcodding@redhat.com \
    --cc=dhowells@redhat.com \
    --cc=jeremymc@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sorenson@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.