nvdimm.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	Dave Jiang <dave.jiang@intel.com>
Subject: Re: dax_lock_mapping_entry was never safe
Date: Mon, 26 Nov 2018 12:36:26 -0800	[thread overview]
Message-ID: <CAPcyv4j8Qo0rZniWcjrtScBWNsG6=geyZU1yfRK=4wGsJ5=e8A@mail.gmail.com> (raw)
In-Reply-To: <20181126171137.GD25835@quack2.suse.cz>

On Mon, Nov 26, 2018 at 9:11 AM Jan Kara <jack@suse.cz> wrote:
>
> On Mon 26-11-18 08:12:40, Matthew Wilcox wrote:
> >
> > I noticed this path while I was doing the 4.19 backport of
> > dax: Avoid losing wakeup in dax_lock_mapping_entry
> >
> >                 xa_unlock_irq(&mapping->i_pages);
> >                 revalidate = wait_fn();
> >                 finish_wait(wq, &ewait.wait);
> >                 xa_lock_irq(&mapping->i_pages);
>
> I guess this is a snippet from get_unlocked_entry(), isn't it?
>
> > It's not safe to call xa_lock_irq() if mapping can have been freed while
> > we slept.  We'll probably get away with it; most filesystems use a unique
> > slab for their inodes, so you'll likely get either a freed inode or an
> > inode which is now the wrong inode.  But if that page has been freed back
> > to the page allocator, that pointer could now be pointing at anything.
>
> Correct. Thanks for catching this bug!

Yes, nice catch!

>
> > Fixing this in the current codebase is no easier than fixing it in the
> > 4.19 codebase.  This is the best I've come up with.  Could we do better
> > by not using the _exclusive form of prepare_to_wait()?  I'm not familiar
> > with all the things that need to be considered when using this family
> > of interfaces.
> >
> > diff --git a/fs/dax.c b/fs/dax.c
> > index 9bcce89ea18e..154b592b18eb 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -232,6 +232,24 @@ static void *get_unlocked_entry(struct xa_state *xas)
> >       }
> >  }
> >
> > +static void wait_unlocked_entry(struct xa_state *xas, void *entry)
> > +{
> > +     struct wait_exceptional_entry_queue ewait;
> > +     wait_queue_head_t *wq;
> > +
> > +     init_wait(&ewait.wait);
> > +     ewait.wait.func = wake_exceptional_entry_func;
> > +
> > +     wq = dax_entry_waitqueue(xas, entry, &ewait.key);
> > +     prepare_to_wait_exclusive(wq, &ewait.wait, TASK_UNINTERRUPTIBLE);
> > +     xas_unlock_irq(xas);
> > +     /* We can no longer look at xas */
> > +     schedule();
> > +     finish_wait(wq, &ewait.wait);
> > +     if (waitqueue_active(wq))
> > +             __wake_up(wq, TASK_NORMAL, 1, &ewait.key);
> > +}
> > +
>
> The code looks good. Maybe can we call this wait_entry_unlocked() to stress
> that entry is not really usable after this function returns? And comment
> before the function that this is safe to call even if we don't have a
> reference keeping mapping alive?

Yes, maybe even something more ambiguous like "wait_entry_event()",
because there's no guarantee the entry is unlocked just that now is a
good time to try to interrogate the entry again.

>
> >  static void put_unlocked_entry(struct xa_state *xas, void *entry)
> >  {
> >       /* If we were the only waiter woken, wake the next one */
> > @@ -389,9 +407,7 @@ bool dax_lock_mapping_entry(struct page *page)
> >               entry = xas_load(&xas);
> >               if (dax_is_locked(entry)) {
> >                       rcu_read_unlock();
> > -                     entry = get_unlocked_entry(&xas);
> > -                     xas_unlock_irq(&xas);
> > -                     put_unlocked_entry(&xas, entry);
> > +                     wait_unlocked_entry(&xas, entry);
> >                       rcu_read_lock();
> >                       continue;
>
> The continue here actually is not safe either because if the mapping got
> freed, page->mapping will be NULL and we oops at the beginning of the loop.
> So that !dax_mapping() check should also check for mapping != NULL.

Yes.

  reply	other threads:[~2018-11-26 20:36 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-26 16:12 dax_lock_mapping_entry was never safe Matthew Wilcox
2018-11-26 17:11 ` Jan Kara
2018-11-26 20:36   ` Dan Williams [this message]
2018-11-27 18:59     ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPcyv4j8Qo0rZniWcjrtScBWNsG6=geyZU1yfRK=4wGsJ5=e8A@mail.gmail.com' \
    --to=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).