LKML Archive on lore.kernel.org
 help / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>,
	Dave Chinner <david@fromorbit.com>,
	Chris Leech <cleech@redhat.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Lee Duncan <lduncan@suse.com>,
	open-iscsi@googlegroups.com,
	Linux SCSI List <linux-scsi@vger.kernel.org>,
	linux-block@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Dave Jones <davej@codemonkey.org.uk>, Jan Kara <jack@suse.cz>
Subject: Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0
Date: Sat, 7 Jan 2017 21:02:00 -0500
Message-ID: <20170108020200.GA16312@cmpxchg.org> (raw)
In-Reply-To: <20170103122825.GC3780@quack2.suse.cz>

On Tue, Jan 03, 2017 at 01:28:25PM +0100, Jan Kara wrote:
> On Mon 02-01-17 16:11:36, Johannes Weiner wrote:
> > On Fri, Dec 23, 2016 at 03:33:29AM -0500, Johannes Weiner wrote:
> > > On Fri, Dec 23, 2016 at 02:32:41AM -0500, Johannes Weiner wrote:
> > > > On Thu, Dec 22, 2016 at 12:22:27PM -0800, Hugh Dickins wrote:
> > > > > On Wed, 21 Dec 2016, Linus Torvalds wrote:
> > > > > > On Wed, Dec 21, 2016 at 9:13 PM, Dave Chinner <david@fromorbit.com> wrote:
> > > > > > > I unmounted the fs, mkfs'd it again, ran the
> > > > > > > workload again and about a minute in this fired:
> > > > > > >
> > > > > > > [628867.607417] ------------[ cut here ]------------
> > > > > > > [628867.608603] WARNING: CPU: 2 PID: 16925 at mm/workingset.c:461 shadow_lru_isolate+0x171/0x220
> > > > > > 
> > > > > > Well, part of the changes during the merge window were the shadow
> > > > > > entry tracking changes that came in through Andrew's tree. Adding
> > > > > > Johannes Weiner to the participants.

Okay, the below patch should address this problem. Dave Jones managed
to reproduce it with the added WARN_ONs, and they made it obvious. He
cannot trigger it anymore with this fix applied. Thanks Dave!

Linus? Andrew?

---

>From 503eeb20e68bdf3529bdc14aca1ce564880129f2 Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Fri, 6 Jan 2017 19:21:43 -0500
Subject: [PATCH] mm: workingset: fix use-after-free in shadow node shrinker

Several people report seeing warnings about inconsistent radix tree
nodes followed by crashes in the workingset code, which all looked
like use-after-free access from the shadow node shrinker. Dave Jones
managed to reproduce the issue with a debug patch applied, which
confirmed that the radix tree shrinking indeed frees shadow nodes
while they are still linked to the shadow LRU:

  WARNING: CPU: 2 PID: 53 at lib/radix-tree.c:643 delete_node+0x1e4/0x200
  CPU: 2 PID: 53 Comm: kswapd0 Not tainted 4.10.0-rc2-think+ #3
  Call Trace:
   dump_stack+0x4f/0x73
   __warn+0xcb/0xf0
   warn_slowpath_null+0x1d/0x20
   delete_node+0x1e4/0x200
   __radix_tree_delete_node+0xd/0x10
   shadow_lru_isolate+0xe6/0x220
   __list_lru_walk_one.isra.4+0x9b/0x190
   ? memcg_drain_all_list_lrus+0x1d0/0x1d0
   list_lru_walk_one+0x23/0x30
   scan_shadow_nodes+0x2e/0x40
   shrink_slab.part.44+0x23d/0x5d0
   ? 0xffffffffa023a077
   shrink_node+0x22c/0x330
   kswapd+0x392/0x8f0

This is the WARN_ON_ONCE(!list_empty(&node->private_list)) placed in
the inlined radix_tree_shrink().

The problem is with 14b468791fa9 ("mm: workingset: move shadow entry
tracking to radix tree exceptional tracking"), which passes an update
callback into the radix tree to link and unlink shadow leaf nodes when
tree entries change, but forgot to pass the callback when reclaiming a
shadow node. While the reclaimed shadow node itself is unlinked by the
shrinker, its deletion from the tree can cause the left-most leaf node
in the tree to be shrunk. If that happens to be a shadow node as well,
we don't unlink it from the LRU as we should.

Consider this tree, where the s are shadow entries:

     root->rnode
          |
     [0       n]
      |       |
   [s    ] [sssss]

Now the shadow node shrinker reclaims the rightmost leaf node through
the shadow node LRU:

     root->rnode
          |
     [0        ]
      |
  [s     ]

Because the parent of the deleted node is the first level below the
root and has only one child in the left-most slot, the intermediate
level is shrunk and the node containing the single shadow is put in
its place:

     root->rnode
          |
     [s        ]

The shrinker again sees a single left-most slot in a first level node
and thus decides to store the shadow in root->rnode directly and free
the node - which is a leaf node on the shadow node LRU.

root->rnode
     |
     s

Without the update callback, the freed node remains on the shadow LRU,
where it causes later shrinker runs to crash.

Pass the node updater callback into __radix_tree_delete_node() in case
the deletion causes the left-most branch in the tree to collapse too.

Also add warnings when linked nodes are freed right away, rather than
wait for the use-after-free when the list is scanned much later.

Fixes: 14b468791fa9 ("mm: workingset: move shadow entry tracking to radix tree exceptional tracking")
Reported-by: Dave Chinner <david@fromorbit.com>
Reported-by: Hugh Dickins <hughd@google.com>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Tested-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/radix-tree.h |  4 +++-
 lib/radix-tree.c           | 11 +++++++++--
 mm/workingset.c            |  3 ++-
 3 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index 5dea8f6440e4..52bda854593b 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -306,7 +306,9 @@ void radix_tree_iter_replace(struct radix_tree_root *,
 void radix_tree_replace_slot(struct radix_tree_root *root,
 			     void **slot, void *item);
 void __radix_tree_delete_node(struct radix_tree_root *root,
-			      struct radix_tree_node *node);
+			      struct radix_tree_node *node,
+			      radix_tree_update_node_t update_node,
+			      void *private);
 void *radix_tree_delete_item(struct radix_tree_root *, unsigned long, void *);
 void *radix_tree_delete(struct radix_tree_root *, unsigned long);
 void radix_tree_clear_tags(struct radix_tree_root *root,
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 6f382e07de77..0b92d605fb69 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -640,6 +640,7 @@ static inline void radix_tree_shrink(struct radix_tree_root *root,
 				update_node(node, private);
 		}
 
+		WARN_ON_ONCE(!list_empty(&node->private_list));
 		radix_tree_node_free(node);
 	}
 }
@@ -666,6 +667,7 @@ static void delete_node(struct radix_tree_root *root,
 			root->rnode = NULL;
 		}
 
+		WARN_ON_ONCE(!list_empty(&node->private_list));
 		radix_tree_node_free(node);
 
 		node = parent;
@@ -767,6 +769,7 @@ static void radix_tree_free_nodes(struct radix_tree_node *node)
 			struct radix_tree_node *old = child;
 			offset = child->offset + 1;
 			child = child->parent;
+			WARN_ON_ONCE(!list_empty(&node->private_list));
 			radix_tree_node_free(old);
 			if (old == entry_to_node(node))
 				return;
@@ -1824,15 +1827,19 @@ EXPORT_SYMBOL(radix_tree_gang_lookup_tag_slot);
  *	__radix_tree_delete_node    -    try to free node after clearing a slot
  *	@root:		radix tree root
  *	@node:		node containing @index
+ *	@update_node:	callback for changing leaf nodes
+ *	@private:	private data to pass to @update_node
  *
  *	After clearing the slot at @index in @node from radix tree
  *	rooted at @root, call this function to attempt freeing the
  *	node and shrinking the tree.
  */
 void __radix_tree_delete_node(struct radix_tree_root *root,
-			      struct radix_tree_node *node)
+			      struct radix_tree_node *node,
+			      radix_tree_update_node_t update_node,
+			      void *private)
 {
-	delete_node(root, node, NULL, NULL);
+	delete_node(root, node, update_node, private);
 }
 
 /**
diff --git a/mm/workingset.c b/mm/workingset.c
index 241fa5d6b3b2..abb58ffa3c64 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -473,7 +473,8 @@ static enum lru_status shadow_lru_isolate(struct list_head *item,
 	if (WARN_ON_ONCE(node->exceptional))
 		goto out_invalid;
 	inc_node_state(page_pgdat(virt_to_page(node)), WORKINGSET_NODERECLAIM);
-	__radix_tree_delete_node(&mapping->page_tree, node);
+	__radix_tree_delete_node(&mapping->page_tree, node,
+				 workingset_update_node, mapping);
 
 out_invalid:
 	spin_unlock(&mapping->tree_lock);
-- 
2.10.2

  parent reply index

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-14 22:24 Dave Chinner
2016-12-14 22:29 ` Dave Chinner
2016-12-16 18:59   ` Chris Leech
2016-12-21 22:16     ` Dave Chinner
2016-12-21 23:19       ` Linus Torvalds
2016-12-22  0:13         ` Chris Leech
2016-12-22  5:13           ` Dave Chinner
2016-12-22  5:46             ` Linus Torvalds
2016-12-22  6:50               ` Dave Chinner
2016-12-22 18:50                 ` Chris Leech
2016-12-22 23:53                   ` Ming Lei
2016-12-23  0:03                     ` Chris Leech
2016-12-23 10:00                       ` Christoph Hellwig
2016-12-23 19:42                         ` Linus Torvalds
2016-12-24  2:45                           ` Jens Axboe
2016-12-24  9:49                             ` Christoph Hellwig
2016-12-24 10:07                           ` Christoph Hellwig
2016-12-24 13:17                             ` Hannes Reinecke
2016-12-24 13:19                               ` Christoph Hellwig
2017-01-04 14:07                               ` Christoph Hellwig
2016-12-22 20:22               ` Hugh Dickins
2016-12-23  7:32                 ` Johannes Weiner
2016-12-23  8:33                   ` Johannes Weiner
2017-01-02 21:11                     ` Johannes Weiner
2017-01-03 12:28                       ` Jan Kara
2017-01-04 15:26                         ` Laurence Oberman
2017-01-04 17:38                           ` Laurence Oberman
2017-01-08  2:02                         ` Johannes Weiner [this message]
2017-01-08  2:17                           ` Linus Torvalds
2017-01-09 20:30                           ` Jan Kara
2017-01-09 20:45                             ` Johannes Weiner
2016-12-22  6:28             ` Dave Chinner
2016-12-22 17:24               ` Linus Torvalds
2016-12-22 20:20                 ` Thomas Gleixner
2016-12-22 20:42                 ` Dave Chinner
2016-12-22 21:06                   ` Dave Chinner
2016-12-22 21:10                     ` Linus Torvalds
2016-12-22 22:15                       ` Dave Chinner
2016-12-22 22:33                         ` Dave Chinner
2016-12-23  3:52                           ` Dave Chinner
2016-12-23  0:16                       ` Jens Axboe
2016-12-22  6:18         ` Christoph Hellwig
2016-12-22  6:30           ` Dave Chinner
2016-12-22  6:36             ` Christoph Hellwig

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170108020200.GA16312@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=cleech@redhat.com \
    --cc=davej@codemonkey.org.uk \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=lduncan@suse.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=open-iscsi@googlegroups.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org linux-kernel@archiver.kernel.org
	public-inbox-index lkml


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox