linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Alternative to signals/sys_membarrier() in liburcu
       [not found]     ` <666590480.287502.1426193588471.JavaMail.zimbra@efficios.com>
@ 2015-03-12 20:56       ` Mathieu Desnoyers
  2015-03-12 21:12         ` Paul E. McKenney
  2015-03-12 23:59         ` One Thousand Gnomes
  2015-03-12 21:47       ` Linus Torvalds
  1 sibling, 2 replies; 10+ messages in thread
From: Mathieu Desnoyers @ 2015-03-12 20:56 UTC (permalink / raw)
  To: Michael Sullivan
  Cc: Peter Zijlstra, LKML, Steven Rostedt, lttng-dev, Thomas Gleixner,
	Paul E. McKenney, Linus Torvalds, Ingo Molnar

(sorry for re-send, my mail client tricked me into posting HTML
to lkml)

Hi, 

Michael Sullivan proposed a clever hack abusing mprotect() to 
perform the same effect as sys_membarrier() I submitted a few 
years ago ( https://lkml.org/lkml/2010/4/18/15 ). 

At that time, the sys_membarrier implementation was deemed 
technically sound, but there were not enough users of the system call 
to justify its inclusion. 

So far, the number of users of liburcu has increased, but liburcu 
still appears to be the only direct user of sys_membarrier. On this 
front, we could argue that many other system calls have only 
one user: glibc. In that respect, liburcu is quite similar to glibc. 

So the question as it stands appears to be: would you be comfortable 
having users abuse mprotect(), relying on its side-effect of issuing 
a smp_mb() on each targeted CPU for the TLB shootdown, as 
an effective implementation of process-wide memory barrier ? 

Thoughts ? 

Thanks! 

Mathieu 





From: "Michael Sullivan" <sully@msully.net> 
To: "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com> 
Cc: lttng-dev@lists.lttng.org 
Sent: Thursday, March 12, 2015 12:04:07 PM 
Subject: Re: [lttng-dev] Alternative to signals/sys_membarrier() in liburcu 

On Thu, Mar 12, 2015 at 10:57 AM, Mathieu Desnoyers < mathieu.desnoyers@efficios.com > wrote: 




Even though it depends on internal behavior not currently specified by mprotect, 
I'd very much like to see the prototype you have, 


I ended up posting my code at https://github.com/msullivan/userspace-rcu/tree/msync-barrier . 
The interesting patch is https://github.com/msullivan/userspace-rcu/commit/04656b468d418efbc5d934ab07954eb8395a7ab0 . 

Quick blog post I wrote about it at http://www.msully.net/blog/2015/02/24/forcing-memory-barriers-on-other-cpus-with-mprotect2/ . 
(I talked briefly about sys_membarrier in the post as best as I could piece together from LKML; if my comment on it is inaccurate I can edit the post.) 

-Michael Sullivan 



-- 
Mathieu Desnoyers 
EfficiOS Inc. 
http://www.efficios.com 

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Alternative to signals/sys_membarrier() in liburcu
  2015-03-12 20:56       ` Alternative to signals/sys_membarrier() in liburcu Mathieu Desnoyers
@ 2015-03-12 21:12         ` Paul E. McKenney
  2015-03-14 21:06           ` Benjamin Herrenschmidt
  2015-03-12 23:59         ` One Thousand Gnomes
  1 sibling, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2015-03-12 21:12 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Michael Sullivan, Peter Zijlstra, LKML, Steven Rostedt,
	lttng-dev, Thomas Gleixner, Linus Torvalds, Ingo Molnar,
	linux-arch

On Thu, Mar 12, 2015 at 08:56:00PM +0000, Mathieu Desnoyers wrote:
> (sorry for re-send, my mail client tricked me into posting HTML
> to lkml)
> 
> Hi, 
> 
> Michael Sullivan proposed a clever hack abusing mprotect() to 
> perform the same effect as sys_membarrier() I submitted a few 
> years ago ( https://lkml.org/lkml/2010/4/18/15 ). 
> 
> At that time, the sys_membarrier implementation was deemed 
> technically sound, but there were not enough users of the system call 
> to justify its inclusion. 
> 
> So far, the number of users of liburcu has increased, but liburcu 
> still appears to be the only direct user of sys_membarrier. On this 
> front, we could argue that many other system calls have only 
> one user: glibc. In that respect, liburcu is quite similar to glibc. 
> 
> So the question as it stands appears to be: would you be comfortable 
> having users abuse mprotect(), relying on its side-effect of issuing 
> a smp_mb() on each targeted CPU for the TLB shootdown, as 
> an effective implementation of process-wide memory barrier ? 
> 
> Thoughts ? 

Are there any architectures left that use hardware-assisted global
TLB invalidation?  On such an architecture, you might not get a memory
barrier except on the CPU executing the mprotect() or munmap().

(Here is hoping that no one does -- it is a cute abuse^Whack otherwise!)

							Thanx, Paul

> Thanks! 
> 
> Mathieu 
> 
> 
> 
> 
> 
> From: "Michael Sullivan" <sully@msully.net> 
> To: "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com> 
> Cc: lttng-dev@lists.lttng.org 
> Sent: Thursday, March 12, 2015 12:04:07 PM 
> Subject: Re: [lttng-dev] Alternative to signals/sys_membarrier() in liburcu 
> 
> On Thu, Mar 12, 2015 at 10:57 AM, Mathieu Desnoyers < mathieu.desnoyers@efficios.com > wrote: 
> 
> 
> 
> 
> Even though it depends on internal behavior not currently specified by mprotect, 
> I'd very much like to see the prototype you have, 
> 
> 
> I ended up posting my code at https://github.com/msullivan/userspace-rcu/tree/msync-barrier . 
> The interesting patch is https://github.com/msullivan/userspace-rcu/commit/04656b468d418efbc5d934ab07954eb8395a7ab0 . 
> 
> Quick blog post I wrote about it at http://www.msully.net/blog/2015/02/24/forcing-memory-barriers-on-other-cpus-with-mprotect2/ . 
> (I talked briefly about sys_membarrier in the post as best as I could piece together from LKML; if my comment on it is inaccurate I can edit the post.) 
> 
> -Michael Sullivan 
> 
> 
> 
> -- 
> Mathieu Desnoyers 
> EfficiOS Inc. 
> http://www.efficios.com 
> 
> _______________________________________________
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Alternative to signals/sys_membarrier() in liburcu
       [not found]     ` <666590480.287502.1426193588471.JavaMail.zimbra@efficios.com>
  2015-03-12 20:56       ` Alternative to signals/sys_membarrier() in liburcu Mathieu Desnoyers
@ 2015-03-12 21:47       ` Linus Torvalds
  2015-03-12 22:30         ` Mathieu Desnoyers
  1 sibling, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2015-03-12 21:47 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Michael Sullivan, lttng-dev, LKML, Paul E. McKenney,
	Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Steven Rostedt

On Thu, Mar 12, 2015 at 1:53 PM, Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
> So the question as it stands appears to be: would you be comfortable
> having users abuse mprotect(), relying on its side-effect of issuing
> a smp_mb() on each targeted CPU for the TLB shootdown, as
> an effective implementation of process-wide memory barrier ?

Be *very* careful.

Just yesterday, in another thread (discussing the auto-numa TLB
performance regression), we were discussing skipping the TLB
invalidates entirely if the mprotect relaxes the protections.

Because if you *used* to be read-only, and them mprotect() something
so that it is read-write, there really is no need to send a TLB
invalidate, at least on x86. You can just change the page tables, and
*if* any entries are stale in the TLB they'll take a microfault on
access and then just reload the TLB.

So mprotect() to a more permissive mode is not necessarily serializing.

Also, you need to make sure that your page is actually in memory,
because otherwise the kernel may end up seeing "oh, it's not even
present", and never flush the TLB at all.

So now you need to mlock that page. Which can be problematic for non-root.

In other words, I'd be a bit leery about it. There may be other
gotcha's about it.

                      Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Alternative to signals/sys_membarrier() in liburcu
  2015-03-12 21:47       ` Linus Torvalds
@ 2015-03-12 22:30         ` Mathieu Desnoyers
  2015-03-13  8:07           ` Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: Mathieu Desnoyers @ 2015-03-12 22:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Michael Sullivan, lttng-dev, LKML, Paul E. McKenney,
	Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Steven Rostedt

----- Original Message -----
> From: "Linus Torvalds" <torvalds@linux-foundation.org>
> To: "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>
> Cc: "Michael Sullivan" <sully@msully.net>, lttng-dev@lists.lttng.org, "LKML" <linux-kernel@vger.kernel.org>, "Paul E.
> McKenney" <paulmck@linux.vnet.ibm.com>, "Peter Zijlstra" <peterz@infradead.org>, "Ingo Molnar" <mingo@kernel.org>,
> "Thomas Gleixner" <tglx@linutronix.de>, "Steven Rostedt" <rostedt@goodmis.org>
> Sent: Thursday, March 12, 2015 5:47:05 PM
> Subject: Re: Alternative to signals/sys_membarrier() in liburcu
> 
> On Thu, Mar 12, 2015 at 1:53 PM, Mathieu Desnoyers
> <mathieu.desnoyers@efficios.com> wrote:
> >
> > So the question as it stands appears to be: would you be comfortable
> > having users abuse mprotect(), relying on its side-effect of issuing
> > a smp_mb() on each targeted CPU for the TLB shootdown, as
> > an effective implementation of process-wide memory barrier ?
> 
> Be *very* careful.
> 
> Just yesterday, in another thread (discussing the auto-numa TLB
> performance regression), we were discussing skipping the TLB
> invalidates entirely if the mprotect relaxes the protections.
> 
> Because if you *used* to be read-only, and them mprotect() something
> so that it is read-write, there really is no need to send a TLB
> invalidate, at least on x86. You can just change the page tables, and
> *if* any entries are stale in the TLB they'll take a microfault on
> access and then just reload the TLB.
> 
> So mprotect() to a more permissive mode is not necessarily serializing.

The idea here is to always mprotect() to a more restrictive mode,
which should trigger the TLB shootdown.

> 
> Also, you need to make sure that your page is actually in memory,
> because otherwise the kernel may end up seeing "oh, it's not even
> present", and never flush the TLB at all.
> 
> So now you need to mlock that page. Which can be problematic for non-root.

I'm aware the default amount of locked memory is usually quite low
(64kB here). So we'd need to handle cases where we run out of locked
memory. We could fallback to a slower userspace RCU scheme if this
occurs.

> 
> In other words, I'd be a bit leery about it. There may be other
> gotcha's about it.

Looking again at this old proposed patch (https://lkml.org/lkml/2010/4/18/15)
which adds a few memory barriers around updates to mm_cpumask
for sys_membarrier makes me wonder whether mprotect() may not skip
some CPU from the mask that would actually need to be taken care of
in very narrow race scenarios.

Thanks,

Mathieu


> 
>                       Linus
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Alternative to signals/sys_membarrier() in liburcu
  2015-03-12 20:56       ` Alternative to signals/sys_membarrier() in liburcu Mathieu Desnoyers
  2015-03-12 21:12         ` Paul E. McKenney
@ 2015-03-12 23:59         ` One Thousand Gnomes
  2015-03-13  0:43           ` Mathieu Desnoyers
  1 sibling, 1 reply; 10+ messages in thread
From: One Thousand Gnomes @ 2015-03-12 23:59 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Michael Sullivan, Peter Zijlstra, LKML, Steven Rostedt,
	lttng-dev, Thomas Gleixner, Paul E. McKenney, Linus Torvalds,
	Ingo Molnar

On Thu, 12 Mar 2015 20:56:00 +0000 (UTC)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> (sorry for re-send, my mail client tricked me into posting HTML
> to lkml)
> 
> Hi, 
> 
> Michael Sullivan proposed a clever hack abusing mprotect() to 
> perform the same effect as sys_membarrier() I submitted a few 
> years ago ( https://lkml.org/lkml/2010/4/18/15 ). 
> 
> At that time, the sys_membarrier implementation was deemed 
> technically sound, but there were not enough users of the system call 
> to justify its inclusion. 
> 
> So far, the number of users of liburcu has increased, but liburcu 
> still appears to be the only direct user of sys_membarrier. On this 
> front, we could argue that many other system calls have only 
> one user: glibc. In that respect, liburcu is quite similar to glibc. 
> 
> So the question as it stands appears to be: would you be comfortable 
> having users abuse mprotect(), relying on its side-effect of issuing 
> a smp_mb() on each targeted CPU for the TLB shootdown, as 
> an effective implementation of process-wide memory barrier ? 

What are you going to do if some future ARM or x86 CPU update with
hardware TLB shootdown appears ? All your code will start to fail on new
kernels using that property, and in nasty insidious ways.

Also doesn't sun4d have hardware shootdown for 16 processors or less ?

I would have thought a membarrier was a lot safer and it can be made to
do whatever horrible things are needed on different processors (indeed it
could even be a pure libc hotpath if some future cpu grows this ability)

Alan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Alternative to signals/sys_membarrier() in liburcu
  2015-03-12 23:59         ` One Thousand Gnomes
@ 2015-03-13  0:43           ` Mathieu Desnoyers
  0 siblings, 0 replies; 10+ messages in thread
From: Mathieu Desnoyers @ 2015-03-13  0:43 UTC (permalink / raw)
  To: One Thousand Gnomes
  Cc: Michael Sullivan, Peter Zijlstra, LKML, Steven Rostedt,
	lttng-dev, Thomas Gleixner, Paul E. McKenney, Linus Torvalds,
	Ingo Molnar

----- Original Message -----
> From: "One Thousand Gnomes" <gnomes@lxorguk.ukuu.org.uk>
> To: "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>
> Cc: "Michael Sullivan" <sully@msully.net>, "Peter Zijlstra" <peterz@infradead.org>, "LKML"
> <linux-kernel@vger.kernel.org>, "Steven Rostedt" <rostedt@goodmis.org>, lttng-dev@lists.lttng.org, "Thomas Gleixner"
> <tglx@linutronix.de>, "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>, "Linus Torvalds"
> <torvalds@linux-foundation.org>, "Ingo Molnar" <mingo@kernel.org>
> Sent: Thursday, March 12, 2015 7:59:38 PM
> Subject: Re: Alternative to signals/sys_membarrier() in liburcu
> 
> On Thu, 12 Mar 2015 20:56:00 +0000 (UTC)
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
> > (sorry for re-send, my mail client tricked me into posting HTML
> > to lkml)
> > 
> > Hi,
> > 
> > Michael Sullivan proposed a clever hack abusing mprotect() to
> > perform the same effect as sys_membarrier() I submitted a few
> > years ago ( https://lkml.org/lkml/2010/4/18/15 ).
> > 
> > At that time, the sys_membarrier implementation was deemed
> > technically sound, but there were not enough users of the system call
> > to justify its inclusion.
> > 
> > So far, the number of users of liburcu has increased, but liburcu
> > still appears to be the only direct user of sys_membarrier. On this
> > front, we could argue that many other system calls have only
> > one user: glibc. In that respect, liburcu is quite similar to glibc.
> > 
> > So the question as it stands appears to be: would you be comfortable
> > having users abuse mprotect(), relying on its side-effect of issuing
> > a smp_mb() on each targeted CPU for the TLB shootdown, as
> > an effective implementation of process-wide memory barrier ?
> 
> What are you going to do if some future ARM or x86 CPU update with
> hardware TLB shootdown appears ? All your code will start to fail on new
> kernels using that property, and in nasty insidious ways.

I'd claim that removing the IPIs breaks userspace, of course. :-P

If we start relying on mprotect() implying memory barriers issued
on all CPUs associated with the memory mapping in core user-space
libraries, then whenever those shiny new CPUs show up, we might be
stuck with the IPIs, otherwise we could claim that removing them
breaks userspace. I would really hate to tie in an assumption like
that on mprotect, because that would really be painting ourselves in
a corner.

> 
> Also doesn't sun4d have hardware shootdown for 16 processors or less ?

That's possible. I'm no sun expert though.

> 
> I would have thought a membarrier was a lot safer and it can be made to
> do whatever horrible things are needed on different processors (indeed it
> could even be a pure libc hotpath if some future cpu grows this ability)

I'd really prefer a well-documented system call for that purpose too.

Thanks,

Mathieu

> 
> Alan
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Alternative to signals/sys_membarrier() in liburcu
  2015-03-12 22:30         ` Mathieu Desnoyers
@ 2015-03-13  8:07           ` Ingo Molnar
  2015-03-13 14:18             ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2015-03-13  8:07 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Linus Torvalds, Michael Sullivan, lttng-dev, LKML,
	Paul E. McKenney, Peter Zijlstra, Thomas Gleixner,
	Steven Rostedt


* Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> ----- Original Message -----
> > From: "Linus Torvalds" <torvalds@linux-foundation.org>
> > To: "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>
> > Cc: "Michael Sullivan" <sully@msully.net>, lttng-dev@lists.lttng.org, "LKML" <linux-kernel@vger.kernel.org>, "Paul E.
> > McKenney" <paulmck@linux.vnet.ibm.com>, "Peter Zijlstra" <peterz@infradead.org>, "Ingo Molnar" <mingo@kernel.org>,
> > "Thomas Gleixner" <tglx@linutronix.de>, "Steven Rostedt" <rostedt@goodmis.org>
> > Sent: Thursday, March 12, 2015 5:47:05 PM
> > Subject: Re: Alternative to signals/sys_membarrier() in liburcu
> > 
> > On Thu, Mar 12, 2015 at 1:53 PM, Mathieu Desnoyers
> > <mathieu.desnoyers@efficios.com> wrote:
> > >
> > > So the question as it stands appears to be: would you be comfortable
> > > having users abuse mprotect(), relying on its side-effect of issuing
> > > a smp_mb() on each targeted CPU for the TLB shootdown, as
> > > an effective implementation of process-wide memory barrier ?
> > 
> > Be *very* careful.
> > 
> > Just yesterday, in another thread (discussing the auto-numa TLB 
> > performance regression), we were discussing skipping the TLB 
> > invalidates entirely if the mprotect relaxes the protections.

We have such code already in mm/mprotect.c, introduced in:

  10c1045f28e8 mm: numa: avoid unnecessary TLB flushes when setting NUMA hinting entries

which does:

                                /* Avoid TLB flush if possible */
                                if (pte_protnone(oldpte))
                                        continue;

> > Because if you *used* to be read-only, and them mprotect() 
> > something so that it is read-write, there really is no need to 
> > send a TLB invalidate, at least on x86. You can just change the 
> > page tables, and *if* any entries are stale in the TLB they'll 
> > take a microfault on access and then just reload the TLB.
> > 
> > So mprotect() to a more permissive mode is not necessarily 
> > serializing.
> 
> The idea here is to always mprotect() to a more restrictive mode, 
> which should trigger the TLB shootdown.

So what happens if a CPU comes around that integrates TLB shootdown 
management into its cache coherency protocol? In such a case IPI 
traffic can be skipped: the memory bus messages take care of TLB 
flushes in most cases.

It's a natural optimization IMHO, because TLB flushes are conceptually 
pretty close to the synchronization mechanisms inherent in data cache 
coherency protocols:

This could be implemented for example by a CPU that knows about ptes 
and handles their modification differently: when a pte is modified it 
will broadcast a MESI invalidation message not just for the cacheline 
belonging to the pte's physical address, but also an 'invalidate TLB' 
MESI message for the pte value's page.

The TLB shootdown would either be guaranteed within the MESI 
transaction, or there would either be a deterministic timing 
guarantee, or some explicit synchronization mechanism (new 
instruction) to make sure the remote TLB(s) got shot down.

Every form of this would be way faster than sending interrupts. New 
OSs could support this by the hardware telling them in which cases the 
TLBs are 'auto-flushed', while old OSs would still be compatible by 
sending (now pointless) TLB shootdown IPIs.

So it's a relatively straightforward hardware optimization IMHO: 
assuming TLB flushes are considered important enough to complicate the 
cacheline state machine (which I think they currently aren't).

So in this case there's no interrupt and no other interruption of the 
remote CPU's flow of execution in any fashion that could advance the 
RCU state machine.

What do you think?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Alternative to signals/sys_membarrier() in liburcu
  2015-03-13  8:07           ` Ingo Molnar
@ 2015-03-13 14:18             ` Paul E. McKenney
  2015-03-23  9:35               ` [lttng-dev] " Duncan Sands
  0 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2015-03-13 14:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mathieu Desnoyers, Linus Torvalds, Michael Sullivan, lttng-dev,
	LKML, Peter Zijlstra, Thomas Gleixner, Steven Rostedt

On Fri, Mar 13, 2015 at 09:07:43AM +0100, Ingo Molnar wrote:
> 
> * Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
> > ----- Original Message -----
> > > From: "Linus Torvalds" <torvalds@linux-foundation.org>
> > > To: "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>
> > > Cc: "Michael Sullivan" <sully@msully.net>, lttng-dev@lists.lttng.org, "LKML" <linux-kernel@vger.kernel.org>, "Paul E.
> > > McKenney" <paulmck@linux.vnet.ibm.com>, "Peter Zijlstra" <peterz@infradead.org>, "Ingo Molnar" <mingo@kernel.org>,
> > > "Thomas Gleixner" <tglx@linutronix.de>, "Steven Rostedt" <rostedt@goodmis.org>
> > > Sent: Thursday, March 12, 2015 5:47:05 PM
> > > Subject: Re: Alternative to signals/sys_membarrier() in liburcu
> > > 
> > > On Thu, Mar 12, 2015 at 1:53 PM, Mathieu Desnoyers
> > > <mathieu.desnoyers@efficios.com> wrote:
> > > >
> > > > So the question as it stands appears to be: would you be comfortable
> > > > having users abuse mprotect(), relying on its side-effect of issuing
> > > > a smp_mb() on each targeted CPU for the TLB shootdown, as
> > > > an effective implementation of process-wide memory barrier ?
> > > 
> > > Be *very* careful.
> > > 
> > > Just yesterday, in another thread (discussing the auto-numa TLB 
> > > performance regression), we were discussing skipping the TLB 
> > > invalidates entirely if the mprotect relaxes the protections.
> 
> We have such code already in mm/mprotect.c, introduced in:
> 
>   10c1045f28e8 mm: numa: avoid unnecessary TLB flushes when setting NUMA hinting entries
> 
> which does:
> 
>                                 /* Avoid TLB flush if possible */
>                                 if (pte_protnone(oldpte))
>                                         continue;
> 
> > > Because if you *used* to be read-only, and them mprotect() 
> > > something so that it is read-write, there really is no need to 
> > > send a TLB invalidate, at least on x86. You can just change the 
> > > page tables, and *if* any entries are stale in the TLB they'll 
> > > take a microfault on access and then just reload the TLB.
> > > 
> > > So mprotect() to a more permissive mode is not necessarily 
> > > serializing.
> > 
> > The idea here is to always mprotect() to a more restrictive mode, 
> > which should trigger the TLB shootdown.
> 
> So what happens if a CPU comes around that integrates TLB shootdown 
> management into its cache coherency protocol? In such a case IPI 
> traffic can be skipped: the memory bus messages take care of TLB 
> flushes in most cases.
> 
> It's a natural optimization IMHO, because TLB flushes are conceptually 
> pretty close to the synchronization mechanisms inherent in data cache 
> coherency protocols:
> 
> This could be implemented for example by a CPU that knows about ptes 
> and handles their modification differently: when a pte is modified it 
> will broadcast a MESI invalidation message not just for the cacheline 
> belonging to the pte's physical address, but also an 'invalidate TLB' 
> MESI message for the pte value's page.
> 
> The TLB shootdown would either be guaranteed within the MESI 
> transaction, or there would either be a deterministic timing 
> guarantee, or some explicit synchronization mechanism (new 
> instruction) to make sure the remote TLB(s) got shot down.
> 
> Every form of this would be way faster than sending interrupts. New 
> OSs could support this by the hardware telling them in which cases the 
> TLBs are 'auto-flushed', while old OSs would still be compatible by 
> sending (now pointless) TLB shootdown IPIs.
> 
> So it's a relatively straightforward hardware optimization IMHO: 
> assuming TLB flushes are considered important enough to complicate the 
> cacheline state machine (which I think they currently aren't).
> 
> So in this case there's no interrupt and no other interruption of the 
> remote CPU's flow of execution in any fashion that could advance the 
> RCU state machine.
> 
> What do you think?

I agree -- there really have been systems able to flush remote TLBs
without interrupting the remote CPU.

So, given the fact that the userspace RCU library does now see
some real-world use, is it now time for Mathieu to resubmit his
sys_membarrier() patch?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Alternative to signals/sys_membarrier() in liburcu
  2015-03-12 21:12         ` Paul E. McKenney
@ 2015-03-14 21:06           ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 10+ messages in thread
From: Benjamin Herrenschmidt @ 2015-03-14 21:06 UTC (permalink / raw)
  To: paulmck
  Cc: Mathieu Desnoyers, Michael Sullivan, Peter Zijlstra, LKML,
	Steven Rostedt, lttng-dev, Thomas Gleixner, Linus Torvalds,
	Ingo Molnar, linux-arch

On Thu, 2015-03-12 at 14:12 -0700, Paul E. McKenney wrote:
> 
> Are there any architectures left that use hardware-assisted global
> TLB invalidation?  

ARM and PowerPC at least...

Cheers,
Ben.

> On such an architecture, you might not get a memory
> barrier except on the CPU executing the mprotect() or munmap().
> 
> (Here is hoping that no one does -- it is a cute abuse^Whack
> otherwise!)
> 
>   



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [lttng-dev] Alternative to signals/sys_membarrier() in liburcu
  2015-03-13 14:18             ` Paul E. McKenney
@ 2015-03-23  9:35               ` Duncan Sands
  0 siblings, 0 replies; 10+ messages in thread
From: Duncan Sands @ 2015-03-23  9:35 UTC (permalink / raw)
  To: paulmck, Ingo Molnar
  Cc: Michael Sullivan, Peter Zijlstra, LKML, Steven Rostedt,
	lttng-dev, Thomas Gleixner, Linus Torvalds

> So, given the fact that the userspace RCU library does now see
> some real-world use, is it now time for Mathieu to resubmit his
> sys_membarrier() patch?

I'm using userspace RCU with success in financial software, so the LTTng project 
isn't the only user.  It works well, but it's not as fast as I'd like.  My 
profiling shows that the performance hit is coming from the memory barriers.  So 
I would very much like to see sys_membarrier go in.

Best wishes, Duncan.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-03-23  9:35 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CANW5cDmTCM9ZmhN7-2eWUEYvD+Y=sGt2i7mecdPTTLHMcT8fPg@mail.gmail.com>
     [not found] ` <867044376.285926.1426172227750.JavaMail.zimbra@efficios.com>
     [not found]   ` <CANW5cDkiZoysNM3rqb4v6Tj996ocsaSh=OZoBLfp4h7ZGb4bxg@mail.gmail.com>
     [not found]     ` <666590480.287502.1426193588471.JavaMail.zimbra@efficios.com>
2015-03-12 20:56       ` Alternative to signals/sys_membarrier() in liburcu Mathieu Desnoyers
2015-03-12 21:12         ` Paul E. McKenney
2015-03-14 21:06           ` Benjamin Herrenschmidt
2015-03-12 23:59         ` One Thousand Gnomes
2015-03-13  0:43           ` Mathieu Desnoyers
2015-03-12 21:47       ` Linus Torvalds
2015-03-12 22:30         ` Mathieu Desnoyers
2015-03-13  8:07           ` Ingo Molnar
2015-03-13 14:18             ` Paul E. McKenney
2015-03-23  9:35               ` [lttng-dev] " Duncan Sands

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).