linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [CORE TOPIC] lightweight per-cpu locks / restartable sequences
@ 2015-07-09 18:32 Andy Lutomirski
  2015-07-09 19:09 ` [Ksummit-discuss] " Chris Mason
  0 siblings, 1 reply; 10+ messages in thread
From: Andy Lutomirski @ 2015-07-09 18:32 UTC (permalink / raw)
  To: ksummit-discuss, linux-kernel
  Cc: Peter Zijlstra, Paul Turner, Mathieu Desnoyers

Several people have suggested that Linux should provide users with a
lightweight mechanism that allows light-weight fancy per-cpu
operations.  This could be used to implement free lists or counters
without any barriers or atomic operations, for example.

There are at least three approaches floating around.  Paul Turner
proposed a single block of userspace code that aborts if it's
preempted -- within that block, percpu variables can be used safely.
Mathieu Desnoyers proposed a more complex variant.  I proposed a much
simpler approach of just offering percpu gs bases on x86, allowing
cmpxchg (as opposed to lock cmpxchg) to access percpu variables.

None of these should be hard to implement, but it would be nice to
hash out whether the kernel should support such a mechanism at all
and, if so, what it would look like.

Jon Corbet unsurprisingly has a nice writeup here:

http://lwn.net/SubscriberLink/650333/f23d07040a58cd46/

--Andy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] lightweight per-cpu locks / restartable sequences
  2015-07-09 18:32 [CORE TOPIC] lightweight per-cpu locks / restartable sequences Andy Lutomirski
@ 2015-07-09 19:09 ` Chris Mason
  2015-07-10 17:26   ` Christoph Lameter
  2015-07-22 14:03   ` Lai Jiangshan
  0 siblings, 2 replies; 10+ messages in thread
From: Chris Mason @ 2015-07-09 19:09 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: ksummit-discuss, linux-kernel, Peter Zijlstra, Mathieu Desnoyers,
	Jens Axboe, Shaohua Li

On Thu, Jul 09, 2015 at 11:32:45AM -0700, Andy Lutomirski wrote:
> Several people have suggested that Linux should provide users with a
> lightweight mechanism that allows light-weight fancy per-cpu
> operations.  This could be used to implement free lists or counters
> without any barriers or atomic operations, for example.
> 
> There are at least three approaches floating around.  Paul Turner
> proposed a single block of userspace code that aborts if it's
> preempted -- within that block, percpu variables can be used safely.
> Mathieu Desnoyers proposed a more complex variant.  I proposed a much
> simpler approach of just offering percpu gs bases on x86, allowing
> cmpxchg (as opposed to lock cmpxchg) to access percpu variables.
> 
> None of these should be hard to implement, but it would be nice to
> hash out whether the kernel should support such a mechanism at all
> and, if so, what it would look like.

[ added Jens and Shaohua ]

We've started experimenting with these to cut overheads in a few
critical places, and while we don't have numbers yet I really hope it
won't take too long.

I think the topic is really interesting and we'll be able to get numbers
from production workloads to help justify and compare different
approaches.

-chris

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] lightweight per-cpu locks / restartable sequences
  2015-07-09 19:09 ` [Ksummit-discuss] " Chris Mason
@ 2015-07-10 17:26   ` Christoph Lameter
  2015-07-13  9:57     ` Peter Zijlstra
  2015-07-22 14:03   ` Lai Jiangshan
  1 sibling, 1 reply; 10+ messages in thread
From: Christoph Lameter @ 2015-07-10 17:26 UTC (permalink / raw)
  To: Chris Mason
  Cc: Andy Lutomirski, ksummit-discuss, Peter Zijlstra, linux-kernel,
	Jens Axboe, Mathieu Desnoyers, Shaohua Li

On Thu, 9 Jul 2015, Chris Mason wrote:

> I think the topic is really interesting and we'll be able to get numbers
> from production workloads to help justify and compare different
> approaches.

Ok that would be important. I also think that the approach may be used
in kernel to reduce the overhead of CONFIG_PREEMPT and also to implement
fast versions of this_cpu_ops for non x86 architectures and maybe even
optimize the x86 variants if interrupts also can detect critical sections
and restart at defined points.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] lightweight per-cpu locks / restartable sequences
  2015-07-10 17:26   ` Christoph Lameter
@ 2015-07-13  9:57     ` Peter Zijlstra
  2015-07-13 14:01       ` Christoph Lameter
                         ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Peter Zijlstra @ 2015-07-13  9:57 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Chris Mason, Andy Lutomirski, ksummit-discuss, linux-kernel,
	Jens Axboe, Mathieu Desnoyers, Shaohua Li

On Fri, Jul 10, 2015 at 12:26:21PM -0500, Christoph Lameter wrote:
> On Thu, 9 Jul 2015, Chris Mason wrote:
> 
> > I think the topic is really interesting and we'll be able to get numbers
> > from production workloads to help justify and compare different
> > approaches.
> 
> Ok that would be important. I also think that the approach may be used
> in kernel to reduce the overhead of CONFIG_PREEMPT and also to implement
> fast versions of this_cpu_ops for non x86 architectures and maybe even

There is nothing stopping people from trying this in-kernel, in fact
that would be lots easier as we do not have to commit to any one
specific ABI for that.

Also, I don't think we need a schedule check for the in-kernel usage,
pure interrupt should be good enough, nobody should (want to) call
schedule() while inside such a critical section, which leaves us with
involuntary preemption, and those are purely interrupt driven.

Now the 'problem' is finding these special regions fast, the easy
solution is the same as the one proposed for userspace, one big section.
That way the interrupt only has to check if the IP is inside this
section which is minimal effort.

The down side is that all percpu ops would then end up being full
function calls. Which on some archs is indeed faster than disabling
interrupts, but not by much I'm afraid.

> optimize the x86 variants if interrupts also can detect critical sections
> and restart at defined points.

I really don't see how we can beat %GS prefixes with any such scheme.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] lightweight per-cpu locks / restartable sequences
  2015-07-13  9:57     ` Peter Zijlstra
@ 2015-07-13 14:01       ` Christoph Lameter
  2015-07-14 20:00         ` Andy Lutomirski
  2015-07-22 14:22       ` Lai Jiangshan
  2015-07-22 14:34       ` Lai Jiangshan
  2 siblings, 1 reply; 10+ messages in thread
From: Christoph Lameter @ 2015-07-13 14:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Chris Mason, Andy Lutomirski, ksummit-discuss, linux-kernel,
	Jens Axboe, Mathieu Desnoyers, Shaohua Li

On Mon, 13 Jul 2015, Peter Zijlstra wrote:

> Now the 'problem' is finding these special regions fast, the easy
> solution is the same as the one proposed for userspace, one big section.
> That way the interrupt only has to check if the IP is inside this
> section which is minimal effort.
>
> The down side is that all percpu ops would then end up being full
> function calls. Which on some archs is indeed faster than disabling
> interrupts, but not by much I'm afraid.

Well one could move the entire functions that are using these ops into the
special sections. That is certainly an area requiring much more thought.

> > optimize the x86 variants if interrupts also can detect critical sections
> > and restart at defined points.
>
> I really don't see how we can beat %GS prefixes with any such scheme.

We may be able to avoid RMV sequences which allows the processor to better
schedule operations.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] lightweight per-cpu locks / restartable sequences
  2015-07-13 14:01       ` Christoph Lameter
@ 2015-07-14 20:00         ` Andy Lutomirski
  2015-07-14 21:15           ` Christoph Lameter
  0 siblings, 1 reply; 10+ messages in thread
From: Andy Lutomirski @ 2015-07-14 20:00 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Peter Zijlstra, Chris Mason, ksummit-discuss, linux-kernel,
	Jens Axboe, Mathieu Desnoyers, Shaohua Li

On Mon, Jul 13, 2015 at 7:01 AM, Christoph Lameter <cl@linux.com> wrote:
> On Mon, 13 Jul 2015, Peter Zijlstra wrote:
>
>> Now the 'problem' is finding these special regions fast, the easy
>> solution is the same as the one proposed for userspace, one big section.
>> That way the interrupt only has to check if the IP is inside this
>> section which is minimal effort.
>>
>> The down side is that all percpu ops would then end up being full
>> function calls. Which on some archs is indeed faster than disabling
>> interrupts, but not by much I'm afraid.
>
> Well one could move the entire functions that are using these ops into the
> special sections. That is certainly an area requiring much more thought.

Hmm.

>
>> > optimize the x86 variants if interrupts also can detect critical sections
>> > and restart at defined points.
>>
>> I really don't see how we can beat %GS prefixes with any such scheme.
>
> We may be able to avoid RMV sequences which allows the processor to better
> schedule operations.

True, but cmpxchg is, surprisingly, pretty fast.

Crazy thought: At the risk of proposing something ridiculous, what if
we had per-cpu memory mappings?  We could do this at the cost of up to
2kB of memcpy whenever we switch mms.  Expensive but maybe not a
showstopper.

--Andy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] lightweight per-cpu locks / restartable sequences
  2015-07-14 20:00         ` Andy Lutomirski
@ 2015-07-14 21:15           ` Christoph Lameter
  0 siblings, 0 replies; 10+ messages in thread
From: Christoph Lameter @ 2015-07-14 21:15 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Peter Zijlstra, Chris Mason, ksummit-discuss, linux-kernel,
	Jens Axboe, Mathieu Desnoyers, Shaohua Li

On Tue, 14 Jul 2015, Andy Lutomirski wrote:

> Crazy thought: At the risk of proposing something ridiculous, what if
> we had per-cpu memory mappings?  We could do this at the cost of up to
> 2kB of memcpy whenever we switch mms.  Expensive but maybe not a
> showstopper.

This is not crazy and actually was done before. Itanium has that and
its doable since the TLB insertion could be handled in software.

The problem on x86 is that one would need a separate page table for each
processor for each task. There is no way to handle TLB faults in
software to my knowledge.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] lightweight per-cpu locks / restartable sequences
  2015-07-09 19:09 ` [Ksummit-discuss] " Chris Mason
  2015-07-10 17:26   ` Christoph Lameter
@ 2015-07-22 14:03   ` Lai Jiangshan
  1 sibling, 0 replies; 10+ messages in thread
From: Lai Jiangshan @ 2015-07-22 14:03 UTC (permalink / raw)
  To: Chris Mason, Andy Lutomirski, ksummit-discuss, linux-kernel,
	Peter Zijlstra, Mathieu Desnoyers, Jens Axboe, Shaohua Li,
	Paul E. McKenney

On Fri, Jul 10, 2015 at 3:09 AM, Chris Mason <clm@fb.com> wrote:

>
> We've started experimenting with these to cut overheads in a few
> critical places, and while we don't have numbers yet I really hope it
> won't take too long.
>
> I think the topic is really interesting and we'll be able to get numbers
> from production workloads to help justify and compare different
> approaches.
>

I was interested by the idea since Paul(paulmck) and Mathieu introduced
it to me at the K.S. 2013.  I didn't expect it is re-posted on LKML so late.
IMHO, the direction is useful and helpful not just only fun, I hope we can make
some progress on it.

Thanks
Lai

> -chris
> _______________________________________________
> Ksummit-discuss mailing list
> Ksummit-discuss@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] lightweight per-cpu locks / restartable sequences
  2015-07-13  9:57     ` Peter Zijlstra
  2015-07-13 14:01       ` Christoph Lameter
@ 2015-07-22 14:22       ` Lai Jiangshan
  2015-07-22 14:34       ` Lai Jiangshan
  2 siblings, 0 replies; 10+ messages in thread
From: Lai Jiangshan @ 2015-07-22 14:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Lameter, Chris Mason, Andy Lutomirski, ksummit-discuss,
	linux-kernel, Jens Axboe, Mathieu Desnoyers, Shaohua Li,
	Paul E. McKenney

On Mon, Jul 13, 2015 at 5:57 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Jul 10, 2015 at 12:26:21PM -0500, Christoph Lameter wrote:
>> On Thu, 9 Jul 2015, Chris Mason wrote:
>>
>> > I think the topic is really interesting and we'll be able to get numbers
>> > from production workloads to help justify and compare different
>> > approaches.
>>
>> Ok that would be important. I also think that the approach may be used
>> in kernel to reduce the overhead of CONFIG_PREEMPT and also to implement
>> fast versions of this_cpu_ops for non x86 architectures and maybe even
>
>
> Also, I don't think we need a schedule check for the in-kernel usage,
> pure interrupt should be good enough, nobody should (want to) call
> schedule() while inside such a critical section, which leaves us with
> involuntary preemption, and those are purely interrupt driven.
>
> Now the 'problem' is finding these special regions fast, the easy
> solution is the same as the one proposed for userspace, one big section.
> That way the interrupt only has to check if the IP is inside this
> section which is minimal effort.
>
> The down side is that all percpu ops would then end up being full
> function calls. Which on some archs is indeed faster than disabling
> interrupts, but not by much I'm afraid.

Anther down site is that all percpu ops can't call any function outside
the section.  Otherwise we would fail to detect whether it is a special
region  or be hard to detect it.

If we disallow the percpu ops calling any function, I think we can
insert some special instructions to the generated code along with
a notation in a table (like exception table for copy_to_user()).
So thus the interrupt only has to check the special instructions
near the IP and confirm it by check it on the table.

>
>> optimize the x86 variants if interrupts also can detect critical sections
>> and restart at defined points.
>
> I really don't see how we can beat %GS prefixes with any such scheme.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Ksummit-discuss] [CORE TOPIC] lightweight per-cpu locks / restartable sequences
  2015-07-13  9:57     ` Peter Zijlstra
  2015-07-13 14:01       ` Christoph Lameter
  2015-07-22 14:22       ` Lai Jiangshan
@ 2015-07-22 14:34       ` Lai Jiangshan
  2 siblings, 0 replies; 10+ messages in thread
From: Lai Jiangshan @ 2015-07-22 14:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Lameter, Chris Mason, Andy Lutomirski, ksummit-discuss,
	linux-kernel, Jens Axboe, Mathieu Desnoyers, Shaohua Li

On Mon, Jul 13, 2015 at 5:57 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Jul 10, 2015 at 12:26:21PM -0500, Christoph Lameter wrote:
>> On Thu, 9 Jul 2015, Chris Mason wrote:
>>
>> > I think the topic is really interesting and we'll be able to get numbers
>> > from production workloads to help justify and compare different
>> > approaches.
>>
>> Ok that would be important. I also think that the approach may be used
>> in kernel to reduce the overhead of CONFIG_PREEMPT and also to implement
>> fast versions of this_cpu_ops for non x86 architectures and maybe even
>
> There is nothing stopping people from trying this in-kernel, in fact
> that would be lots easier as we do not have to commit to any one
> specific ABI for that.

It also provides us a nicer way to fight with NMI and
to modify a slight-biger-struct irq-safely
if we have it in-kenrel.

>
> Also, I don't think we need a schedule check for the in-kernel usage,
> pure interrupt should be good enough, nobody should (want to) call
> schedule() while inside such a critical section, which leaves us with
> involuntary preemption, and those are purely interrupt driven.
>
> Now the 'problem' is finding these special regions fast, the easy
> solution is the same as the one proposed for userspace, one big section.
> That way the interrupt only has to check if the IP is inside this
> section which is minimal effort.
>
> The down side is that all percpu ops would then end up being full
> function calls. Which on some archs is indeed faster than disabling
> interrupts, but not by much I'm afraid.
>
>> optimize the x86 variants if interrupts also can detect critical sections
>> and restart at defined points.
>
> I really don't see how we can beat %GS prefixes with any such scheme.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-07-22 14:34 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-09 18:32 [CORE TOPIC] lightweight per-cpu locks / restartable sequences Andy Lutomirski
2015-07-09 19:09 ` [Ksummit-discuss] " Chris Mason
2015-07-10 17:26   ` Christoph Lameter
2015-07-13  9:57     ` Peter Zijlstra
2015-07-13 14:01       ` Christoph Lameter
2015-07-14 20:00         ` Andy Lutomirski
2015-07-14 21:15           ` Christoph Lameter
2015-07-22 14:22       ` Lai Jiangshan
2015-07-22 14:34       ` Lai Jiangshan
2015-07-22 14:03   ` Lai Jiangshan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).