From: Alex Williamson <alex.williamson@redhat.com>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
kirill.shutemov@linux.intel.com, linux-kernel@vger.kernel.org,
"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [BUG] vfio device assignment regression with THP ref counting redesign
Date: Fri, 29 Apr 2016 09:12:57 -0600 [thread overview]
Message-ID: <20160429091257.76e953b7@t450s.home> (raw)
In-Reply-To: <20160429070611.GA4990@node.shutemov.name>
On Fri, 29 Apr 2016 10:06:11 +0300
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> On Thu, Apr 28, 2016 at 08:45:42PM -0600, Alex Williamson wrote:
> > On Fri, 29 Apr 2016 03:51:06 +0300
> > "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> >
> > > On Fri, Apr 29, 2016 at 01:21:27AM +0200, Andrea Arcangeli wrote:
> > > > Hello Alex and Kirill,
> > > >
> > > > On Thu, Apr 28, 2016 at 12:58:08PM -0600, Alex Williamson wrote:
> > > > > > > specific fix to this code is not applicable. It also still occurs on
> > > > > > > kernels as recent as v4.6-rc5, so the issue hasn't been silently fixed
> > > > > > > yet. I'm able to reproduce this fairly quickly with the above test,
> > > > > > > but it's not hard to imagine a test w/o any iommu dependencies which
> > > > > > > simply does a user directed get_user_pages_fast() on a set of userspace
> > > > > > > addresses, retains the reference, and at some point later rechecks that
> > > > > > > a new get_user_pages_fast() results in the same page address. It
> > > >
> > > > Can you try to "git revert 1f25fe20a76af0d960172fb104d4b13697cafa84"
> > > > and then apply the below patch on top of the revert?
> > > >
> > > > Totally untested... if I missed something and it isn't correct, I hope
> > > > this brings us in the right direction faster at least.
> > > >
> > > > Overall the problem I think is that we need to restore full accuracy
> > > > and we can't deal with false positive COWs (which aren't entirely
> > > > cheap either... reading 512 cachelines should be much faster than
> > > > copying 2MB and using 4MB of CPU cache). 32k vs 4MB. The problem of
> > > > course is when we really need a COW, we'll waste an additional 32k,
> > > > but then it doesn't matter that much as we'd be forced to load 4MB of
> > > > cache anyway in such case. There's room for optimizations but even the
> > > > simple below patch would be ok for now.
> > > >
> > > > From 09e3d1ff10b49fb9c3ab77f0b96a862848e30067 Mon Sep 17 00:00:00 2001
> > > > From: Andrea Arcangeli <aarcange@redhat.com>
> > > > Date: Fri, 29 Apr 2016 01:05:06 +0200
> > > > Subject: [PATCH 1/1] mm: thp: calculate page_mapcount() correctly for THP
> > > > pages
> > > >
> > > > This allows to revert commit 1f25fe20a76af0d960172fb104d4b13697cafa84
> > > > and it provides fully accuracy with wrprotect faults so page pinning
> > > > will stop causing false positive copy-on-writes.
> > > >
> > > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> > > > ---
> > > > mm/util.c | 5 +++--
> > > > 1 file changed, 3 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/mm/util.c b/mm/util.c
> > > > index 6cc81e7..a0b9f63 100644
> > > > --- a/mm/util.c
> > > > +++ b/mm/util.c
> > > > @@ -383,9 +383,10 @@ struct address_space *page_mapping(struct page *page)
> > > > /* Slow path of page_mapcount() for compound pages */
> > > > int __page_mapcount(struct page *page)
> > > > {
> > > > - int ret;
> > > > + int ret = 0, i;
> > > >
> > > > - ret = atomic_read(&page->_mapcount) + 1;
> > > > + for (i = 0; i < HPAGE_PMD_NR; i++)
> > > > + ret = max(ret, atomic_read(&page->_mapcount) + 1);
> > > > page = compound_head(page);
> > > > ret += atomic_read(compound_mapcount_ptr(page)) + 1;
> > > > if (PageDoubleMap(page))
> > >
> > > You are right about the cause. I spend some time on wrong path: I was only
> > > able to trigger the bug with numa balancing enabled, so I assumed
> > > something is wrong in that code...
> > >
> > > I would like to preserve current page_mapcount() behaviouts.
> > > I think this fix is better:
> >
> > This also seems to work in my testing, but assuming all else being
> > equal, there is a performance difference between the two for this test
> > case in favor of Andrea's solution. Modifying the test to exit after
> > the first set of iterations, my system takes on average 107s to complete
> > with the solution below or 103.5s with the other approach. Please note
> > that I have every mm debugging option I could find enabled and THP
> > scanning full speed on the system, so I don't know how this would play
> > out in a more tuned configuration.
> >
> > The only reason I noticed is that I added a side test to sleep a random
> > number of seconds and kill the test program because sometimes killing
> > the test triggers errors. I didn't see any errors with either of these
> > solutions, but suspected the first solution was completing more
> > iterations for similar intervals. Modifying the test to exit seems to
> > prove that true.
> >
> > I can't speak to which is the more architecturally correct solution,
> > but there may be a measurable performance difference to consider.
>
> Hm. I just woke up and haven't got any coffee yet, but I don't why my
> approach would be worse for performance. Both have the same algorithmic
> complexity.
I can't explain it either, I won't claim to understand either solution,
but with all the kernel hacking vm debug/sanity options disabled, there
still appears to be a very slight advantage to Andrea's proposal. I
expect the test program should show this even if you're having a
difficult time using it to reproduce the bug. I ran Andrea's patch
overnight, no issues reported. Running your patch for an extended test
now. Thanks,
Alex
next prev parent reply other threads:[~2016-04-29 15:13 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-28 16:20 [BUG] vfio device assignment regression with THP ref counting redesign Alex Williamson
2016-04-28 18:17 ` Kirill A. Shutemov
2016-04-28 18:58 ` Alex Williamson
2016-04-28 23:21 ` Andrea Arcangeli
2016-04-29 0:44 ` Alex Williamson
2016-04-29 0:51 ` Kirill A. Shutemov
2016-04-29 2:45 ` Alex Williamson
2016-04-29 7:06 ` Kirill A. Shutemov
2016-04-29 15:12 ` Alex Williamson [this message]
2016-04-29 16:34 ` Andrea Arcangeli
2016-04-29 22:34 ` Alex Williamson
2016-05-02 10:41 ` Kirill A. Shutemov
2016-05-02 11:15 ` Jerome Glisse
2016-05-02 12:14 ` GUP guarantees wrt to userspace mappings redesign Kirill A. Shutemov
2016-05-02 13:39 ` Jerome Glisse
2016-05-02 15:00 ` GUP guarantees wrt to userspace mappings Kirill A. Shutemov
2016-05-02 15:22 ` Jerome Glisse
2016-05-02 16:12 ` Kirill A. Shutemov
2016-05-02 19:14 ` Andrea Arcangeli
2016-05-02 19:11 ` Andrea Arcangeli
2016-05-02 19:02 ` Andrea Arcangeli
2016-05-02 14:15 ` GUP guarantees wrt to userspace mappings redesign Oleg Nesterov
2016-05-02 16:21 ` Kirill A. Shutemov
2016-05-02 16:22 ` Oleg Nesterov
2016-05-02 18:03 ` Kirill A. Shutemov
2016-05-02 17:41 ` Oleg Nesterov
2016-05-02 18:56 ` Andrea Arcangeli
2016-05-02 15:23 ` [BUG] vfio device assignment regression with THP ref counting redesign Andrea Arcangeli
2016-05-02 16:00 ` Kirill A. Shutemov
2016-05-02 18:03 ` Andrea Arcangeli
2016-05-05 1:19 ` Alex Williamson
2016-05-05 14:39 ` Andrea Arcangeli
2016-05-05 15:09 ` Andrea Arcangeli
2016-05-05 15:11 ` Kirill A. Shutemov
2016-05-05 15:24 ` Andrea Arcangeli
2016-05-06 7:29 ` Kirill A. Shutemov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160429091257.76e953b7@t450s.home \
--to=alex.williamson@redhat.com \
--cc=aarcange@redhat.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kirill@shutemov.name \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).