nvdimm.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-nvdimm@lists.01.org, Dave Chinner <david@fromorbit.com>,
	linux-kernel@vger.kernel.org,
	Matthew Wilcox <willy@infradead.org>,
	Christoph Hellwig <hch@lst.de>,
	stable@vger.kernel.org
Subject: Re: [PATCH 5/5] radix tree: fix multi-order iteration race
Date: Wed, 9 May 2018 09:09:38 -0600	[thread overview]
Message-ID: <20180509150938.GA3814@linux.intel.com> (raw)
In-Reply-To: <20180509124611.6hoa743z4qrx6bgc@quack2.suse.cz>

On Wed, May 09, 2018 at 02:46:11PM +0200, Jan Kara wrote:
> On Thu 03-05-18 13:24:30, Ross Zwisler wrote:
> > Fix a race in the multi-order iteration code which causes the kernel to hit
> > a GP fault.  This was first seen with a production v4.15 based kernel
> > (4.15.6-300.fc27.x86_64) utilizing a DAX workload which used order 9 PMD
> > DAX entries.
> > 
> > The race has to do with how we tear down multi-order sibling entries when
> > we are removing an item from the tree.  Remember for example that an order
> > 2 entry looks like this:
> > 
> > struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
> > 
> > where 'entry' is in some slot in the struct radix_tree_node, and the three
> > slots following 'entry' contain sibling pointers which point back to
> > 'entry.'
> > 
> > When we delete 'entry' from the tree, we call :
> >   radix_tree_delete()
> >     radix_tree_delete_item()
> >       __radix_tree_delete()
> >         replace_slot()
> > 
> > replace_slot() first removes the siblings in order from the first to the
> > last, then at then replaces 'entry' with NULL.  This means that for a brief
> > period of time we end up with one or more of the siblings removed, so:
> > 
> > struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
> > 
> > This causes an issue if you have a reader iterating over the slots in the
> > tree via radix_tree_for_each_slot() while only under
> > rcu_read_lock()/rcu_read_unlock() protection.  This is a common case in
> > mm/filemap.c.
> > 
> > The issue is that when __radix_tree_next_slot() => skip_siblings() tries to
> > skip over the sibling entries in the slots, it currently does so with an
> > exact match on the slot directly preceding our current slot.  Normally this
> > works:
> >                                     V preceding slot
> > struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
> >                                             ^ current slot
> > 
> > This lets you find the first sibling, and you skip them all in order.
> > 
> > But in the case where one of the siblings is NULL, that slot is skipped and
> > then our sibling detection is interrupted:
> > 
> >                                            V preceding slot
> > struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
> >                                                   ^ current slot
> > 
> > This means that the sibling pointers aren't recognized since they point all
> > the way back to 'entry', so we think that they are normal internal radix
> > tree pointers.  This causes us to think we need to walk down to a struct
> > radix_tree_node starting at the address of 'entry'.
> > 
> > In a real running kernel this will crash the thread with a GP fault when
> > you try and dereference the slots in your broken node starting at 'entry'.
> > 
> > We fix this race by fixing the way that skip_siblings() detects sibling
> > nodes.  Instead of testing against the preceding slot we instead look for
> > siblings via is_sibling_entry() which compares against the position of the
> > struct radix_tree_node.slots[] array.  This ensures that sibling entries
> > are properly identified, even if they are no longer contiguous with the
> > 'entry' they point to.
> > 
> > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> > Reported-by: CR, Sapthagirish <sapthagirish.cr@intel.com>
> > Fixes: commit 148deab223b2 ("radix-tree: improve multiorder iterators")
> > Cc: <stable@vger.kernel.org>
> 
> Looks good to me. You can add:
> 
> Reviewed-by: Jan Kara <jack@suse.cz>

Thank you for the review, Jan.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

  reply	other threads:[~2018-05-09 15:09 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-03 19:24 [PATCH 0/5] fix radix tree multi-order iteration race Ross Zwisler
2018-05-03 19:24 ` [PATCH 1/5] radix tree test suite: fix mapshift build target Ross Zwisler
2018-07-15 23:00   ` Matthew Wilcox
2018-07-16 16:07     ` Ross Zwisler
2018-07-16 19:52       ` Matthew Wilcox
2018-07-16 21:08         ` Ross Zwisler
2018-07-17  2:41           ` Matthew Wilcox
2018-07-21 23:45             ` Dave Chinner
2018-07-22  3:11               ` Ross Zwisler
2018-07-17  3:18       ` Matthew Wilcox
2018-07-17 17:17         ` Ross Zwisler
2018-05-03 19:24 ` [PATCH 2/5] radix tree test suite: fix compilation issue Ross Zwisler
2018-05-03 19:24 ` [PATCH 3/5] radix tree test suite: add item_delete_rcu() Ross Zwisler
2018-05-03 19:24 ` [PATCH 4/5] radix tree test suite: multi-order iteration race Ross Zwisler
2018-05-03 19:24 ` [PATCH 5/5] radix tree: fix " Ross Zwisler
2018-05-09 12:46   ` Jan Kara
2018-05-09 15:09     ` Ross Zwisler [this message]
2018-05-08 17:44 ` [PATCH 0/5] fix radix tree " Ross Zwisler
2018-05-10 22:48   ` Andrew Morton
2018-05-10 22:54     ` Dan Williams
2018-05-10 23:12       ` Andrew Morton
2018-05-10 23:19         ` Dan Williams
2018-05-11  4:04     ` Ross Zwisler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180509150938.GA3814@linux.intel.com \
    --to=ross.zwisler@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=stable@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).