All of lore.kernel.org
 help / color / mirror / Atom feed
From: Juergen Gross <jgross@suse.com>
To: Jann Horn <jannh@google.com>, joel@joelfernandes.org
Cc: "kernel list" <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	kernel-team@android.com, "Minchan Kim" <minchan@google.com>,
	"Hugh Dickins" <hughd@google.com>,
	lokeshgidra@google.com,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Kate Stewart" <kstewart@linuxfoundation.org>,
	pombredanne@nexb.com, "Thomas Gleixner" <tglx@linutronix.de>,
	"Boris Ostrovsky" <boris.ostrovsky@oracle.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	kvm@vger.kernel.org
Subject: Re: [PATCH] mm: Speed up mremap on large regions
Date: Fri, 12 Oct 2018 07:29:42 +0200	[thread overview]
Message-ID: <42b81ac4-35de-754e-545b-d57b3bab3b7a@suse.com> (raw)
In-Reply-To: <CAG48ez3yLkMcyaTXFt_+w8_-HtmrjW=XB51DDQSGdjPj43XWmA@mail.gmail.com>

On 12/10/2018 05:21, Jann Horn wrote:
> +cc xen maintainers and kvm folks
> 
> On Fri, Oct 12, 2018 at 4:40 AM Joel Fernandes (Google)
> <joel@joelfernandes.org> wrote:
>> Android needs to mremap large regions of memory during memory management
>> related operations. The mremap system call can be really slow if THP is
>> not enabled. The bottleneck is move_page_tables, which is copying each
>> pte at a time, and can be really slow across a large map. Turning on THP
>> may not be a viable option, and is not for us. This patch speeds up the
>> performance for non-THP system by copying at the PMD level when possible.
> [...]
>> +bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr,
>> +                 unsigned long new_addr, unsigned long old_end,
>> +                 pmd_t *old_pmd, pmd_t *new_pmd, bool *need_flush)
>> +{
> [...]
>> +       /*
>> +        * We don't have to worry about the ordering of src and dst
>> +        * ptlocks because exclusive mmap_sem prevents deadlock.
>> +        */
>> +       old_ptl = pmd_lock(vma->vm_mm, old_pmd);
>> +       if (old_ptl) {
>> +               pmd_t pmd;
>> +
>> +               new_ptl = pmd_lockptr(mm, new_pmd);
>> +               if (new_ptl != old_ptl)
>> +                       spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
>> +
>> +               /* Clear the pmd */
>> +               pmd = *old_pmd;
>> +               pmd_clear(old_pmd);
>> +
>> +               VM_BUG_ON(!pmd_none(*new_pmd));
>> +
>> +               /* Set the new pmd */
>> +               set_pmd_at(mm, new_addr, new_pmd, pmd);
>> +               if (new_ptl != old_ptl)
>> +                       spin_unlock(new_ptl);
>> +               spin_unlock(old_ptl);
> 
> How does this interact with Xen PV? From a quick look at the Xen PV
> integration code in xen_alloc_ptpage(), it looks to me as if, in a
> config that doesn't use split ptlocks, this is going to temporarily
> drop Xen's type count for the page to zero, causing Xen to de-validate
> and then re-validate the L1 pagetable; if you first set the new pmd
> before clearing the old one, that wouldn't happen. I don't know how
> this interacts with shadow paging implementations.

No, this isn't an issue. As the L1 pagetable isn't being released it
will stay pinned, so there will be no need to revalidate it.

For Xen in shadow mode I'm quite sure it just doesn't matter. In the
case another thread of the process is accessing the memory in parallel
it might even be better to not having a L1 pagetable with 2 references
at the same time, but this is an academic problem which doesn't need to
be tuned for performance IMO.


Juergen

  reply	other threads:[~2018-10-12  5:29 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-09 20:14 [PATCH] mm: Speed up mremap on large regions Joel Fernandes (Google)
2018-10-09 21:38 ` Andrew Morton
2018-10-09 23:08   ` Joel Fernandes
2018-10-09 22:02 ` Kirill A. Shutemov
2018-10-09 23:04   ` Joel Fernandes
2018-10-10 10:00     ` Kirill A. Shutemov
2018-10-11  0:46       ` Joel Fernandes
2018-10-11  0:50         ` Joel Fernandes
2018-10-11  5:14         ` Kirill A. Shutemov
2018-10-11  8:11           ` Kirill A. Shutemov
2018-10-11  8:11             ` Kirill A. Shutemov
2018-10-12  1:47             ` Joel Fernandes
2018-10-12  1:47               ` Joel Fernandes
2018-10-11  8:17         ` Kirill A. Shutemov
2018-10-11 12:02           ` Michal Hocko
2018-10-12  3:21 ` Jann Horn
2018-10-12  5:29   ` Juergen Gross [this message]
2018-10-12  5:34     ` Jann Horn
2018-10-12  7:29       ` Juergen Gross
2018-10-12  7:24   ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=42b81ac4-35de-754e-545b-d57b3bab3b7a@suse.com \
    --to=jgross@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=boris.ostrovsky@oracle.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hughd@google.com \
    --cc=jannh@google.com \
    --cc=joel@joelfernandes.org \
    --cc=kernel-team@android.com \
    --cc=kstewart@linuxfoundation.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lokeshgidra@google.com \
    --cc=minchan@google.com \
    --cc=pbonzini@redhat.com \
    --cc=pombredanne@nexb.com \
    --cc=rkrcmar@redhat.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.