linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Laurent Dufour <ldufour@linux.vnet.ibm.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: paulmck@linux.vnet.ibm.com, peterz@infradead.org,
	akpm@linux-foundation.org, kirill@shutemov.name,
	ak@linux.intel.com, mhocko@kernel.org, dave@stgolabs.net,
	jack@suse.cz, Matthew Wilcox <willy@infradead.org>,
	benh@kernel.crashing.org, mpe@ellerman.id.au, paulus@samba.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	hpa@zytor.com, Will Deacon <will.deacon@arm.com>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	haren@linux.vnet.ibm.com, khandual@linux.vnet.ibm.com,
	npiggin@gmail.com, bsingharora@gmail.com,
	Tim Chen <tim.c.chen@linux.intel.com>,
	linuxppc-dev@lists.ozlabs.org, x86@kernel.org
Subject: Re: [PATCH v5 07/22] mm: Protect VMA modifications using VMA sequence count
Date: Mon, 6 Nov 2017 10:47:00 +0100	[thread overview]
Message-ID: <c774dc09-dc3d-b721-8339-081e7584b709@linux.vnet.ibm.com> (raw)
In-Reply-To: <20171102200840.GC22686@redhat.com>

Hi Andrea,

On 02/11/2017 21:08, Andrea Arcangeli wrote:
> On Thu, Nov 02, 2017 at 06:25:11PM +0100, Laurent Dufour wrote:
>> I think there is some memory barrier missing when the VMA is modified so
>> currently the modifications done in the VMA structure may not be written
>> down at the time the pte is locked. So doing that change will also requires
>> to call smp_wmb() before locking the page tables. In the current patch this
>> is ensured by the call to write_seqcount_end().
>> Doing so will still require to have a memory barrier when touching the VMA.
>> Not sure we get far better performance compared to the sequence count
>> change. But I'll give it a try anyway ;)
> 
> Luckily smp_wmb is a noop on x86. I would suggest to ignore the above
> issue completely if you give it a try, and then if this performs, we
> can just embed a smp_wmb() before spin_lock() somewhere in
> pte_offset_map_lock/pte_lockptr/spin_lock_nested for those archs whose
> spin_lock isn't a smp_wmb() equivalent. I would focus at flushing
> writes before every pagetable spin_lock for non-x86 archs, rather than
> after all vma modifications. That should be easier to keep under
> control and it's going to be more efficient too as if something there
> are fewer spin locks than vma modifications.

I do agree that would simplify the patch series a lot.
I'll double check that pte lock is not done in a loop other wise having
smp_wmb() there will be bad.

Another point I'm trying to double check is that we may have inconsistency
while reading the vma's flags in the page fault path until the memory
barrier got it in the VMA's changing path. Especially we may have vm_flags
and vm_page_prot not matching at all, which couldn't happen when checking
for the vm_sequence count.

> 
> For non-x86 archs we may then need a smp_wmb__before_spin_lock. That
> looks more self contained than surrounding all vma modifications and
> it's a noop on x86 anyway.
> 
> I thought about the contention detection logic too yesterday: to
> detect contention we could have a mm->mmap_sem_contention_jiffies and
> if down_read_trylock_exclusive() [same as down_read_if_not_hold in
> prev mail] fails (and it'll fail if either read or write mmap_sem is
> hold, so also convering mremap/mprotect etc..) we set
> mm->mmap_sem_contention_jiffies = jiffies and then to know if you must
> not touch the mmap_sem at all, you compare jiffies against
> mmap_sem_contention_jiffies, if it's equal we go speculative. If
> that's not enough we can just keep going speculative for a few more
> jiffies with time_before(). The srcu lock is non concerning because the
> inc/dec of the fast path is in per-cpu cacheline of course, no false
> sharing possible there or it wouldn't be any better than a normal lock.

I'm sorry, I should have missed something here. I can't see how this would
help fixing the case where a thread is entering the page fault handler
seeing that no one else has the mmap_sem and then grab it. While it is
processing the page fault another thread is entering mprotect for instance
and thus will wait for the mmap_sem to be released by the thread processing
the page fault.

Cheers,
Laurent.

> The vma revalidation is already done by khugepaged and mm/userfaultfd,
> both need to drop the mmap_sem and continue working on the pagetables,
> so we already know it's workable and not too slow.
> 
> Summarizing.. by using a runtime contention triggered speculative
> design that goes speculative only when contention is runtime-detected
> using the above logic (or equivalent), and by having to revalidate the
> vma by hand with find_vma without knowing instantly if the vma become
> stale, we will run with a substantially slower speculative page fault
> than with your current speculative always-on design, but the slower
> speculative page fault runtime will still scale 100% in SMP so it
> should still be faster on large SMP systems. The pros is that it won't
> regress the mmap/brk vma modifications. The whole complexity of
> tracking the vma modifications should also go away and the resulting
> code should be more maintainable and less risky to break in subtle
> ways impossible to reproduce.
> 
> Thanks!
> Andrea
> 

  reply	other threads:[~2017-11-06  9:47 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-11 13:52 [PATCH v5 00/22] Speculative page faults Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 01/22] x86/mm: Define CONFIG_SPF Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 02/22] powerpc/mm: " Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 03/22] mm: Dont assume page-table invariance during faults Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 04/22] mm: Prepare for FAULT_FLAG_SPECULATIVE Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 05/22] mm: Introduce pte_spinlock " Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 06/22] mm: VMA sequence count Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 07/22] mm: Protect VMA modifications using " Laurent Dufour
2017-10-26 10:18   ` Andrea Arcangeli
2017-11-02 15:16     ` Laurent Dufour
2017-11-02 17:25       ` Laurent Dufour
2017-11-02 20:08         ` Andrea Arcangeli
2017-11-06  9:47           ` Laurent Dufour [this message]
2017-10-11 13:52 ` [PATCH v5 08/22] mm: RCU free VMAs Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 09/22] mm: Cache some VMA fields in the vm_fault structure Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 10/22] mm: Protect SPF handler against anon_vma changes Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 11/22] mm/migrate: Pass vm_fault pointer to migrate_misplaced_page() Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 12/22] mm: Introduce __lru_cache_add_active_or_unevictable Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 13/22] mm: Introduce __maybe_mkwrite() Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 14/22] mm: Introduce __vm_normal_page() Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 15/22] mm: Introduce __page_add_new_anon_rmap() Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 16/22] mm: Provide speculative fault infrastructure Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 17/22] mm: Try spin lock in speculative path Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 18/22] mm: Adding speculative page fault failure trace events Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 19/22] perf: Add a speculative page fault sw event Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 20/22] perf tools: Add support for the SPF perf event Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 21/22] x86/mm: Add speculative pagefault handling Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 22/22] powerpc/mm: Add speculative page fault Laurent Dufour
2017-10-26  8:14   ` [v5,22/22] " kemi
2017-11-02 14:11     ` Laurent Dufour
2017-11-06 10:27       ` Sergey Senozhatsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c774dc09-dc3d-b721-8339-081e7584b709@linux.vnet.ibm.com \
    --to=ldufour@linux.vnet.ibm.com \
    --cc=aarcange@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexei.starovoitov@gmail.com \
    --cc=benh@kernel.crashing.org \
    --cc=bsingharora@gmail.com \
    --cc=dave@stgolabs.net \
    --cc=haren@linux.vnet.ibm.com \
    --cc=hpa@zytor.com \
    --cc=jack@suse.cz \
    --cc=khandual@linux.vnet.ibm.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mhocko@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=will.deacon@arm.com \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).