linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [RFC] perf: fix building for ARCv1
       [not found] <1445088959-3058-1-git-send-email-abrodkin@synopsys.com>
@ 2015-10-17 14:19 ` Vineet Gupta
  2015-10-18 11:15   ` Alexey Brodkin
  2015-10-30  6:21 ` [RFC] perf: fix building for ARCv1 Vineet Gupta
  1 sibling, 1 reply; 24+ messages in thread
From: Vineet Gupta @ 2015-10-17 14:19 UTC (permalink / raw)
  To: Alexey Brodkin
  Cc: Peter Zijlstra, Aabid Rushdi, lkml, linux-perf-users, Darren Hart

On Saturday 17 October 2015 07:06 PM, Alexey Brodkin wrote:
> Perf uses atomic options and so it is required to have atomics enabled
> in toolchain.
>
> In case of ARC atomics are enabled by default for ARCv2 but disabled for
> ARCv1. Now we explicitly enable atomics for either ARC achitecture
> version so perf could be successfully built.
>
> Currently on attempt to build perf for ARCv1 you'll see tons of:
> ----------------->8-----------------
> undefined reference to `__sync_add_and_fetch_4'
> ----------------->8-----------------
>
> Still note if ARCv1 CPU is configured without LL/SC perf will crash on
> execution once "llock" instruction is attempted to be executed.

Ok this fixes ARCompact - assuming it will have LL/SC. We do have old SoCs w/o
that support.
So what we are saying is that any arch (or a configuration thereof) which doesn't
support atomic r-m-w can't even build perf now - that sucks !

A better way would be to do feature test for __sync_xyz and make atomic_xxx
wrappers call __sync_xyz) vs. an empty stub.
So atleast such arches can build and do "some" perf work !

-Vineet

> Cc: Vineet Gupta <vgupta@synopsys.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
> ---
>  tools/perf/config/Makefile | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
> index 38a0853..dc7c0a8 100644
> --- a/tools/perf/config/Makefile
> +++ b/tools/perf/config/Makefile
> @@ -47,6 +47,11 @@ ifeq ($(ARCH),arm64)
>    LIBUNWIND_LIBS = -lunwind -lunwind-aarch64
>  endif
>  
> +# Additional ARCH settings for ARC
> +ifeq ($(ARCH),arc)
> +  CFLAGS += -matomic
> +endif
> +
>  ifeq ($(NO_PERF_REGS),0)
>    $(call detected,CONFIG_PERF_REGS)
>  endif


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2015-10-17 14:19 ` [RFC] perf: fix building for ARCv1 Vineet Gupta
@ 2015-10-18 11:15   ` Alexey Brodkin
  2015-10-18 23:14     ` Andi Kleen
  0 siblings, 1 reply; 24+ messages in thread
From: Alexey Brodkin @ 2015-10-18 11:15 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Aabid Rushdi, linux-kernel, peterz, linux-perf-users, dvhart,
	dsahern, acme

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 3287 bytes --]

Hi Vineet,

Looks like this time atomics are a must. And that really sucks!

See these commits that introduce usage of atomic_xxx() all around the perf and tools it uses:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f812d3045c2385ac16237e68b156859c4005526e
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d3a7c489c7fd2463e3b2c3a2179c7be879dd9cb4
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7143849a5d6a5c623d81790d92f0033507c5b14f
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=59a51c1dc9fbb3fb4af928b852d7b35df83edd74
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e1ed3a5b87ed6759e16ec93f16aae83d2cc77ca2

and that's the one that introduced usage of the following generic gcc's atomics
(__sync_add_and_fetch/__sync_sub_and_fetch):
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=da6d8567512df11e0473b710c07de87efde5709c

So the best we may do is to implement detection of atomics in the toolchain and if there's no atomics hard stop with
perf building.

-Alexey

On Sat, 2015-10-17 at 14:19 +0000, Vineet Gupta wrote:
> On Saturday 17 October 2015 07:06 PM, Alexey Brodkin wrote:
> > Perf uses atomic options and so it is required to have atomics enabled
> > in toolchain.
> > 
> > In case of ARC atomics are enabled by default for ARCv2 but disabled for
> > ARCv1. Now we explicitly enable atomics for either ARC achitecture
> > version so perf could be successfully built.
> > 
> > Currently on attempt to build perf for ARCv1 you'll see tons of:
> > ----------------->8-----------------
> > undefined reference to `__sync_add_and_fetch_4'
> > ----------------->8-----------------
> > 
> > Still note if ARCv1 CPU is configured without LL/SC perf will crash on
> > execution once "llock" instruction is attempted to be executed.
> 
> Ok this fixes ARCompact - assuming it will have LL/SC. We do have old SoCs w/o
> that support.
> So what we are saying is that any arch (or a configuration thereof) which doesn't
> support atomic r-m-w can't even build perf now - that sucks !
> 
> A better way would be to do feature test for __sync_xyz and make atomic_xxx
> wrappers call __sync_xyz) vs. an empty stub.
> So atleast such arches can build and do "some" perf work !
> 
> -Vineet
> 
> > Cc: Vineet Gupta <vgupta@synopsys.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> > Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
> > ---
> >  tools/perf/config/Makefile | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
> > index 38a0853..dc7c0a8 100644
> > --- a/tools/perf/config/Makefile
> > +++ b/tools/perf/config/Makefile
> > @@ -47,6 +47,11 @@ ifeq ($(ARCH),arm64)
> >    LIBUNWIND_LIBS = -lunwind -lunwind-aarch64
> >  endif
> >  
> > +# Additional ARCH settings for ARC
> > +ifeq ($(ARCH),arc)
> > +  CFLAGS += -matomic
> > +endif
> > +
> >  ifeq ($(NO_PERF_REGS),0)
> >    $(call detected,CONFIG_PERF_REGS)
> >  endif
> 
> ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2015-10-18 11:15   ` Alexey Brodkin
@ 2015-10-18 23:14     ` Andi Kleen
  2015-10-19  4:58       ` Vineet Gupta
  0 siblings, 1 reply; 24+ messages in thread
From: Andi Kleen @ 2015-10-18 23:14 UTC (permalink / raw)
  To: Alexey Brodkin
  Cc: Vineet Gupta, Aabid Rushdi, linux-kernel, peterz,
	linux-perf-users, dvhart, dsahern, acme

Alexey Brodkin <Alexey.Brodkin@synopsys.com> writes:
>
> So the best we may do is to implement detection of atomics in the toolchain and if there's no atomics hard stop with
> perf building.

If your target is single cpu only you can always simulate them in C.

-Andi

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2015-10-18 23:14     ` Andi Kleen
@ 2015-10-19  4:58       ` Vineet Gupta
  2015-10-19  5:49         ` Andi Kleen
  0 siblings, 1 reply; 24+ messages in thread
From: Vineet Gupta @ 2015-10-19  4:58 UTC (permalink / raw)
  To: Andi Kleen, Alexey Brodkin
  Cc: Aabid Rushdi, linux-kernel, peterz, linux-perf-users, dvhart,
	dsahern, acme

On Monday 19 October 2015 04:45 AM, Andi Kleen wrote:
> Alexey Brodkin <Alexey.Brodkin@synopsys.com> writes:
>> So the best we may do is to implement detection of atomics in the toolchain and if there's no atomics hard stop with
>> perf building.
> If your target is single cpu only you can always simulate them in C.
>
> -Andi

But this user space - so IMHO UP/SMP doesn't matter and we can't simulate them in
C just by itself.

-Vineet

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2015-10-19  4:58       ` Vineet Gupta
@ 2015-10-19  5:49         ` Andi Kleen
  2015-10-19  9:28           ` Vineet Gupta
  0 siblings, 1 reply; 24+ messages in thread
From: Andi Kleen @ 2015-10-19  5:49 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Alexey Brodkin, Aabid Rushdi, linux-kernel, peterz,
	linux-perf-users, dvhart, dsahern, acme

Vineet Gupta <Vineet.Gupta1@synopsys.com> writes:
>
> But this user space - so IMHO UP/SMP doesn't matter and we can't simulate them in
> C just by itself.

It matters when you access the perf ring buffer which is updated by kernel.
Also perf is now multi threaded to some degree.

-Andi

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2015-10-19  5:49         ` Andi Kleen
@ 2015-10-19  9:28           ` Vineet Gupta
  2015-10-19  9:35             ` Peter Zijlstra
  0 siblings, 1 reply; 24+ messages in thread
From: Vineet Gupta @ 2015-10-19  9:28 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alexey Brodkin, Aabid Rushdi, linux-kernel, peterz,
	linux-perf-users, dvhart, dsahern, acme

On Monday 19 October 2015 11:20 AM, Andi Kleen wrote:
> Vineet Gupta <Vineet.Gupta1@synopsys.com> writes:
>> But this user space - so IMHO UP/SMP doesn't matter and we can't simulate them in
>> C just by itself.
> It matters when you access the perf ring buffer which is updated by kernel.

That's part of the problem. The issue is with atomic_* APIs proliferation in perf
user space code which assumes native atomix r-m-w support which is not always
true. So I think we still need a feature detection mechanism and if absent leave
the ball in arch court by calling arch_atomic_* which can use creative or half
working measures so perf will work to some extent atleast and not bomb outright.

Also can u please elaborate a bit on "simulate them in C" - u mean just simple
unprotected LD, OP, ST or do u fancy usage of futex etc?

> Also perf is now multi threaded to some degree.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2015-10-19  9:28           ` Vineet Gupta
@ 2015-10-19  9:35             ` Peter Zijlstra
  2015-10-19  9:46               ` Vineet Gupta
  2016-10-18 18:58               ` Vineet Gupta
  0 siblings, 2 replies; 24+ messages in thread
From: Peter Zijlstra @ 2015-10-19  9:35 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Andi Kleen, Alexey Brodkin, Aabid Rushdi, linux-kernel,
	linux-perf-users, dvhart, dsahern, acme

On Mon, Oct 19, 2015 at 09:28:43AM +0000, Vineet Gupta wrote:
> On Monday 19 October 2015 11:20 AM, Andi Kleen wrote:
> > Vineet Gupta <Vineet.Gupta1@synopsys.com> writes:
> >> But this user space - so IMHO UP/SMP doesn't matter and we can't simulate them in
> >> C just by itself.
> > It matters when you access the perf ring buffer which is updated by kernel.
> 
> That's part of the problem. The issue is with atomic_* APIs proliferation in perf
> user space code which assumes native atomix r-m-w support which is not always
> true. So I think we still need a feature detection mechanism and if absent leave
> the ball in arch court by calling arch_atomic_* which can use creative or half
> working measures so perf will work to some extent atleast and not bomb outright.
> 
> Also can u please elaborate a bit on "simulate them in C" - u mean just simple
> unprotected LD, OP, ST or do u fancy usage of futex etc?

Doesn't ARMv5 have a cmpxchg syscall to deal with this? It does an
IRQ-disabled load-op-store sequence.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2015-10-19  9:35             ` Peter Zijlstra
@ 2015-10-19  9:46               ` Vineet Gupta
  2015-10-19  9:51                 ` Peter Zijlstra
  2016-10-18 18:58               ` Vineet Gupta
  1 sibling, 1 reply; 24+ messages in thread
From: Vineet Gupta @ 2015-10-19  9:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Alexey Brodkin, Aabid Rushdi, linux-kernel,
	linux-perf-users, dvhart, dsahern, acme

On Monday 19 October 2015 03:05 PM, Peter Zijlstra wrote:
> On Mon, Oct 19, 2015 at 09:28:43AM +0000, Vineet Gupta wrote:
>> > On Monday 19 October 2015 11:20 AM, Andi Kleen wrote:
>>> > > Vineet Gupta <Vineet.Gupta1@synopsys.com> writes:
>>>> > >> But this user space - so IMHO UP/SMP doesn't matter and we can't simulate them in
>>>> > >> C just by itself.
>>> > > It matters when you access the perf ring buffer which is updated by kernel.
>> > 
>> > That's part of the problem. The issue is with atomic_* APIs proliferation in perf
>> > user space code which assumes native atomix r-m-w support which is not always
>> > true. So I think we still need a feature detection mechanism and if absent leave
>> > the ball in arch court by calling arch_atomic_* which can use creative or half
>> > working measures so perf will work to some extent atleast and not bomb outright.
>> > 
>> > Also can u please elaborate a bit on "simulate them in C" - u mean just simple
>> > unprotected LD, OP, ST or do u fancy usage of futex etc?
> Doesn't ARMv5 have a cmpxchg syscall to deal with this? It does an
> IRQ-disabled load-op-store sequence.

Yeah I remember seeing some syscall like that in ARM.

On ARC we could use the atomic EXchange to implement a user space only binary
semaphore - these atomic ops will be small duration so it is OK to spin wait for a
little bit. That's how the old pthread library worked for ARC w/o any atomic support.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2015-10-19  9:46               ` Vineet Gupta
@ 2015-10-19  9:51                 ` Peter Zijlstra
  2015-10-19 10:04                   ` Vineet Gupta
  2015-10-20  8:00                   ` Vineet Gupta
  0 siblings, 2 replies; 24+ messages in thread
From: Peter Zijlstra @ 2015-10-19  9:51 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Andi Kleen, Alexey Brodkin, Aabid Rushdi, linux-kernel,
	linux-perf-users, dvhart, dsahern, acme

On Mon, Oct 19, 2015 at 09:46:35AM +0000, Vineet Gupta wrote:
> On ARC we could use the atomic EXchange to implement a user space only binary
> semaphore - these atomic ops will be small duration so it is OK to spin wait for a
> little bit. That's how the old pthread library worked for ARC w/o any atomic support.

That has the obvious problem of lock-holder-preemption and the horrible
performance issues that result from that.

I think the syscall at least has deterministic behaviour, whereas that
userspace spin loop has this abysmal worst case thing.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2015-10-19  9:51                 ` Peter Zijlstra
@ 2015-10-19 10:04                   ` Vineet Gupta
  2015-10-20  8:00                   ` Vineet Gupta
  1 sibling, 0 replies; 24+ messages in thread
From: Vineet Gupta @ 2015-10-19 10:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Alexey Brodkin, Aabid Rushdi, linux-kernel,
	linux-perf-users, dvhart, dsahern, acme

On Monday 19 October 2015 03:22 PM, Peter Zijlstra wrote:
> On Mon, Oct 19, 2015 at 09:46:35AM +0000, Vineet Gupta wrote:
>> > On ARC we could use the atomic EXchange to implement a user space only binary
>> > semaphore - these atomic ops will be small duration so it is OK to spin wait for a
>> > little bit. That's how the old pthread library worked for ARC w/o any atomic support.
> That has the obvious problem of lock-holder-preemption and the horrible
> performance issues that result from that.
>
> I think the syscall at least has deterministic behaviour, whereas that
> userspace spin loop has this abysmal worst case thing.

I agree - we can add that syscall trivially and use it based on build time feature
detection for atomics !

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2015-10-19  9:51                 ` Peter Zijlstra
  2015-10-19 10:04                   ` Vineet Gupta
@ 2015-10-20  8:00                   ` Vineet Gupta
  2015-10-20 10:11                     ` Peter Zijlstra
  1 sibling, 1 reply; 24+ messages in thread
From: Vineet Gupta @ 2015-10-20  8:00 UTC (permalink / raw)
  To: Peter Zijlstra, Noam Camus
  Cc: Andi Kleen, Alexey Brodkin, Gilad Ben Yossef, linux-kernel,
	linux-perf-users, dvhart, dsahern, acme

On Monday 19 October 2015 03:22 PM, Peter Zijlstra wrote:
> On Mon, Oct 19, 2015 at 09:46:35AM +0000, Vineet Gupta wrote:
>> On ARC we could use the atomic EXchange to implement a user space only binary
>> semaphore - these atomic ops will be small duration so it is OK to spin wait for a
>> little bit. That's how the old pthread library worked for ARC w/o any atomic support.
> That has the obvious problem of lock-holder-preemption and the horrible
> performance issues that result from that.
>
> I think the syscall at least has deterministic behaviour, whereas that
> userspace spin loop has this abysmal worst case thing.

I don't have issue with adding the syscall per-se. But that comes with it's own
headaches of ABI change - more importantly it requires several things to match,
libc, kernel...  It would be easier if change was confined to say perf.

Can we use existing syscall(s) - again this is what our good old pthread library
code did.

static void __pthread_acquire(int * spinlock)
{
  int cnt = 0;
  struct timespec tm;

  READ_MEMORY_BARRIER();

  while (testandset(spinlock)) {   <---- atomic EXchange
    if (cnt < 50) {
      sched_yield();
      cnt++;
    } else {
      tm.tv_sec = 0;
      tm.tv_nsec = 2000001;
      nanosleep(&tm, ((void *)0));
      cnt = 0;
    }
  }


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2015-10-20  8:00                   ` Vineet Gupta
@ 2015-10-20 10:11                     ` Peter Zijlstra
  2015-10-20 10:45                       ` Vineet Gupta
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2015-10-20 10:11 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Noam Camus, Andi Kleen, Alexey Brodkin, Gilad Ben Yossef,
	linux-kernel, linux-perf-users, dvhart, dsahern, acme,
	Thomas Gleixner

On Tue, Oct 20, 2015 at 08:00:46AM +0000, Vineet Gupta wrote:
> On Monday 19 October 2015 03:22 PM, Peter Zijlstra wrote:
> > On Mon, Oct 19, 2015 at 09:46:35AM +0000, Vineet Gupta wrote:
> >> On ARC we could use the atomic EXchange to implement a user space only binary
> >> semaphore - these atomic ops will be small duration so it is OK to spin wait for a
> >> little bit. That's how the old pthread library worked for ARC w/o any atomic support.
> > That has the obvious problem of lock-holder-preemption and the horrible
> > performance issues that result from that.
> >
> > I think the syscall at least has deterministic behaviour, whereas that
> > userspace spin loop has this abysmal worst case thing.
> 
> I don't have issue with adding the syscall per-se. But that comes with it's own
> headaches of ABI change - more importantly it requires several things to match,
> libc, kernel...  It would be easier if change was confined to say perf.

OTOH fixing all those would get you a 'sane' system :-)

> Can we use existing syscall(s) - again this is what our good old pthread library
> code did.
> 
> static void __pthread_acquire(int * spinlock)
> {
>   int cnt = 0;
>   struct timespec tm;
> 
>   READ_MEMORY_BARRIER();
> 
>   while (testandset(spinlock)) {   <---- atomic EXchange
>     if (cnt < 50) {
>       sched_yield();
>       cnt++;
>     } else {
>       tm.tv_sec = 0;
>       tm.tv_nsec = 2000001;
>       nanosleep(&tm, ((void *)0));
>       cnt = 0;
>     }
>   }

*shudder* that is quite horrible.

This means all your 'atomics' are broken for anything SCHED_FIFO and the
like. You simply _cannot_ run a realtime system.

(also, for ACQUIRE you want the READ_MEMORY_BARRIER() _after_ the
test-and-set control dependency.)

But its your arch..

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2015-10-20 10:11                     ` Peter Zijlstra
@ 2015-10-20 10:45                       ` Vineet Gupta
  2015-10-29 15:58                         ` Alexey Brodkin
  0 siblings, 1 reply; 24+ messages in thread
From: Vineet Gupta @ 2015-10-20 10:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Noam Camus, Andi Kleen, Alexey Brodkin, Gilad Ben Yossef,
	linux-kernel, linux-perf-users, dvhart, dsahern, acme,
	Thomas Gleixner

On Tuesday 20 October 2015 03:41 PM, Peter Zijlstra wrote:
>> > Can we use existing syscall(s) - again this is what our good old pthread library
>> > code did.
>> > 
>> > static void __pthread_acquire(int * spinlock)
>> > {
>> >   int cnt = 0;
>> >   struct timespec tm;
>> > 
>> >   READ_MEMORY_BARRIER();
>> > 
>> >   while (testandset(spinlock)) {   <---- atomic EXchange
>> >     if (cnt < 50) {
>> >       sched_yield();
>> >       cnt++;
>> >     } else {
>> >       tm.tv_sec = 0;
>> >       tm.tv_nsec = 2000001;
>> >       nanosleep(&tm, ((void *)0));
>> >       cnt = 0;
>> >     }
>> >   }
> *shudder* that is quite horrible.
>
> This means all your 'atomics' are broken for anything SCHED_FIFO and the
> like. You simply _cannot_ run a realtime system.

The code above is from uClibc old threading library which we don't use anymore.
The NPTL version doesn't have all of this song-n-dance and relies on futexes. The
change we are talking about is only for the atomics in perf itself. I do
understand your POV though.

> (also, for ACQUIRE you want the READ_MEMORY_BARRIER() _after_ the
> test-and-set control dependency.)

Absolutely and in this case it will have to be added both inside the loop and one
at the end to cover both the scenarios !


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2015-10-20 10:45                       ` Vineet Gupta
@ 2015-10-29 15:58                         ` Alexey Brodkin
  2015-10-30  6:19                           ` Vineet Gupta
  0 siblings, 1 reply; 24+ messages in thread
From: Alexey Brodkin @ 2015-10-29 15:58 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: linux-kernel, peterz, tglx, andi, linux-snps-arc, acme,
	linux-perf-users, giladb, dsahern, dvhart, noamc

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1769 bytes --]

Hi Vineet,

On Tue, 2015-10-20 at 10:45 +0000, Vineet Gupta wrote:
> On Tuesday 20 October 2015 03:41 PM, Peter Zijlstra wrote:
> > > > Can we use existing syscall(s) - again this is what our good old pthread library
> > > > code did.
> > > > 
> > > > static void __pthread_acquire(int * spinlock)
> > > > {
> > > >   int cnt = 0;
> > > >   struct timespec tm;
> > > > 
> > > >   READ_MEMORY_BARRIER();
> > > > 
> > > >   while (testandset(spinlock)) {   <---- atomic EXchange
> > > >     if (cnt < 50) {
> > > >       sched_yield();
> > > >       cnt++;
> > > >     } else {
> > > >       tm.tv_sec = 0;
> > > >       tm.tv_nsec = 2000001;
> > > >       nanosleep(&tm, ((void *)0));
> > > >       cnt = 0;
> > > >     }
> > > >   }
> > *shudder* that is quite horrible.
> > 
> > This means all your 'atomics' are broken for anything SCHED_FIFO and the
> > like. You simply _cannot_ run a realtime system.
> 
> The code above is from uClibc old threading library which we don't use anymore.
> The NPTL version doesn't have all of this song-n-dance and relies on futexes. The
> change we are talking about is only for the atomics in perf itself. I do
> understand your POV though.
> 
> > (also, for ACQUIRE you want the READ_MEMORY_BARRIER() _after_ the
> > test-and-set control dependency.)
> 
> Absolutely and in this case it will have to be added both inside the loop and one
> at the end to cover both the scenarios !
> 

I'm wondering what are our plans for now?
Are we going to accept proposed fix just for ARC in 4.4 (and to all stables then)
or we'll try to come up with more general solution?

-Alexeyÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2015-10-29 15:58                         ` Alexey Brodkin
@ 2015-10-30  6:19                           ` Vineet Gupta
  2016-02-03 16:20                             ` Alexey Brodkin
       [not found]                             ` <1454516455.2811.4.camel__10775.5710989752$1454516490$gmane$org@synopsys.com>
  0 siblings, 2 replies; 24+ messages in thread
From: Vineet Gupta @ 2015-10-30  6:19 UTC (permalink / raw)
  To: Alexey Brodkin
  Cc: linux-kernel, peterz, tglx, andi, linux-snps-arc, acme,
	linux-perf-users, giladb, dsahern, dvhart, noamc

On Thursday 29 October 2015 09:28 PM, Alexey Brodkin wrote:
> Hi Vineet,
>
> On Tue, 2015-10-20 at 10:45 +0000, Vineet Gupta wrote:
>> On Tuesday 20 October 2015 03:41 PM, Peter Zijlstra wrote:
>>>>> Can we use existing syscall(s) - again this is what our good old pthread library
>>>>> code did.
>>>>>
>>>>> static void __pthread_acquire(int * spinlock)
>>>>> {
>>>>>   int cnt = 0;
>>>>>   struct timespec tm;
>>>>>
>>>>>   READ_MEMORY_BARRIER();
>>>>>
>>>>>   while (testandset(spinlock)) {   <---- atomic EXchange
>>>>>     if (cnt < 50) {
>>>>>       sched_yield();
>>>>>       cnt++;
>>>>>     } else {
>>>>>       tm.tv_sec = 0;
>>>>>       tm.tv_nsec = 2000001;
>>>>>       nanosleep(&tm, ((void *)0));
>>>>>       cnt = 0;
>>>>>     }
>>>>>   }
>>> *shudder* that is quite horrible.
>>>
>>> This means all your 'atomics' are broken for anything SCHED_FIFO and the
>>> like. You simply _cannot_ run a realtime system.
>> The code above is from uClibc old threading library which we don't use anymore.
>> The NPTL version doesn't have all of this song-n-dance and relies on futexes. The
>> change we are talking about is only for the atomics in perf itself. I do
>> understand your POV though.
>>
>>> (also, for ACQUIRE you want the READ_MEMORY_BARRIER() _after_ the
>>> test-and-set control dependency.)
>> Absolutely and in this case it will have to be added both inside the loop and one
>> at the end to cover both the scenarios !
>>
> I'm wondering what are our plans for now?
> Are we going to accept proposed fix just for ARC in 4.4 (and to all stables then)
> or we'll try to come up with more general solution?

I agree with the current solution to add -atomic to for arc700 builds.
Although making that default for arc700 tools will be better but that will not fix
things before next release of tools etc.

But we *do* need to improve generic solution
1. Add atomics detection in perf to add fall back arch stubs
2. ARC needs to add syscall for facilitating atomic r-m-w !

-Vineet

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
       [not found] <1445088959-3058-1-git-send-email-abrodkin@synopsys.com>
  2015-10-17 14:19 ` [RFC] perf: fix building for ARCv1 Vineet Gupta
@ 2015-10-30  6:21 ` Vineet Gupta
  1 sibling, 0 replies; 24+ messages in thread
From: Vineet Gupta @ 2015-10-30  6:21 UTC (permalink / raw)
  To: Alexey Brodkin
  Cc: Peter Zijlstra, linux-perf-users, Arnaldo Carvalho de Melo, lkml,
	linux-snps-arc

On Saturday 17 October 2015 07:06 PM, Alexey Brodkin wrote:
> Perf uses atomic options and so it is required to have atomics enabled
> in toolchain.
>
> In case of ARC atomics are enabled by default for ARCv2 but disabled for
> ARCv1. Now we explicitly enable atomics for either ARC achitecture
> version so perf could be successfully built.
>
> Currently on attempt to build perf for ARCv1 you'll see tons of:
> ----------------->8-----------------
> undefined reference to `__sync_add_and_fetch_4'
> ----------------->8-----------------
>
> Still note if ARCv1 CPU is configured without LL/SC perf will crash on
> execution once "llock" instruction is attempted to be executed.
>
> Cc: Vineet Gupta <vgupta@synopsys.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>

Acked-by: Vineet Gupta <vgupta@synopsys.com>


> ---
>  tools/perf/config/Makefile | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
> index 38a0853..dc7c0a8 100644
> --- a/tools/perf/config/Makefile
> +++ b/tools/perf/config/Makefile
> @@ -47,6 +47,11 @@ ifeq ($(ARCH),arm64)
>    LIBUNWIND_LIBS = -lunwind -lunwind-aarch64
>  endif
>  
> +# Additional ARCH settings for ARC
> +ifeq ($(ARCH),arc)
> +  CFLAGS += -matomic
> +endif
> +
>  ifeq ($(NO_PERF_REGS),0)
>    $(call detected,CONFIG_PERF_REGS)
>  endif


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2015-10-30  6:19                           ` Vineet Gupta
@ 2016-02-03 16:20                             ` Alexey Brodkin
       [not found]                             ` <1454516455.2811.4.camel__10775.5710989752$1454516490$gmane$org@synopsys.com>
  1 sibling, 0 replies; 24+ messages in thread
From: Alexey Brodkin @ 2016-02-03 16:20 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: linux-kernel, peterz, tglx, andi, linux-snps-arc, acme,
	linux-perf-users, giladb, dsahern, dvhart, noamc

Hi Vineet,

On Fri, 2015-10-30 at 06:19 +0000, Vineet Gupta wrote:
> On Thursday 29 October 2015 09:28 PM, Alexey Brodkin wrote:
> > Hi Vineet,
> > 
> > On Tue, 2015-10-20 at 10:45 +0000, Vineet Gupta wrote:
> > > On Tuesday 20 October 2015 03:41 PM, Peter Zijlstra wrote:
> > > > > > Can we use existing syscall(s) - again this is what our good old pthread library
> > > > > > code did.
> > > > > > 
> > > > > > static void __pthread_acquire(int * spinlock)
> > > > > > {
> > > > > >   int cnt = 0;
> > > > > >   struct timespec tm;
> > > > > > 
> > > > > >   READ_MEMORY_BARRIER();
> > > > > > 
> > > > > >   while (testandset(spinlock)) {   <---- atomic EXchange
> > > > > >     if (cnt < 50) {
> > > > > >       sched_yield();
> > > > > >       cnt++;
> > > > > >     } else {
> > > > > >       tm.tv_sec = 0;
> > > > > >       tm.tv_nsec = 2000001;
> > > > > >       nanosleep(&tm, ((void *)0));
> > > > > >       cnt = 0;
> > > > > >     }
> > > > > >   }
> > > > *shudder* that is quite horrible.
> > > > 
> > > > This means all your 'atomics' are broken for anything SCHED_FIFO and the
> > > > like. You simply _cannot_ run a realtime system.
> > > The code above is from uClibc old threading library which we don't use anymore.
> > > The NPTL version doesn't have all of this song-n-dance and relies on futexes. The
> > > change we are talking about is only for the atomics in perf itself. I do
> > > understand your POV though.
> > > 
> > > > (also, for ACQUIRE you want the READ_MEMORY_BARRIER() _after_ the
> > > > test-and-set control dependency.)
> > > Absolutely and in this case it will have to be added both inside the loop and one
> > > at the end to cover both the scenarios !
> > > 
> > I'm wondering what are our plans for now?
> > Are we going to accept proposed fix just for ARC in 4.4 (and to all stables then)
> > or we'll try to come up with more general solution?
> 
> I agree with the current solution to add -atomic to for arc700 builds.
> Although making that default for arc700 tools will be better but that will not fix
> things before next release of tools etc.
> 
> But we *do* need to improve generic solution
> 1. Add atomics detection in perf to add fall back arch stubs
> 2. ARC needs to add syscall for facilitating atomic r-m-w !

So the most recent ARC GNU tools (2015.12) were just released yesterday
and still atomics are disabled by default for ARCv1.
That means perf will continue to fail on building for now.

Do you think we may apply my initial fix to 4.5 while it is in development and
then to stable trees as well?

-Alexey

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
       [not found]                             ` <1454516455.2811.4.camel__10775.5710989752$1454516490$gmane$org@synopsys.com>
@ 2016-02-04  4:13                               ` Vineet Gupta
  2016-02-05 11:18                                 ` Noam Camus
  0 siblings, 1 reply; 24+ messages in thread
From: Vineet Gupta @ 2016-02-04  4:13 UTC (permalink / raw)
  To: Alexey Brodkin, Noam Camus
  Cc: linux-perf-users, peterz, dvhart, linux-kernel, acme, andi,
	noamc, dsahern, tglx, linux-snps-arc, giladb

+CC Noam

On Wednesday 03 February 2016 09:50 PM, Alexey Brodkin wrote:
>> I agree with the current solution to add -atomic to for arc700 builds.
>> > Although making that default for arc700 tools will be better but that will not fix
>> > things before next release of tools etc.
>> > 
>> > But we *do* need to improve generic solution
>> > 1. Add atomics detection in perf to add fall back arch stubs
>> > 2. ARC needs to add syscall for facilitating atomic r-m-w !
> So the most recent ARC GNU tools (2015.12) were just released yesterday
> and still atomics are disabled by default for ARCv1.
> That means perf will continue to fail on building for now.
> 
> Do you think we may apply my initial fix to 4.5 while it is in development and
> then to stable trees as well?

Noam, what's the atomic story for EZChip. Do you support such things for user
space in GNU tools. If -atomic is added to perf user space builds are you guys OK!

-Vineet

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2016-02-04  4:13                               ` Vineet Gupta
@ 2016-02-05 11:18                                 ` Noam Camus
       [not found]                                   ` <20160205161027.GG28242@kernel.org>
  0 siblings, 1 reply; 24+ messages in thread
From: Noam Camus @ 2016-02-05 11:18 UTC (permalink / raw)
  To: Vineet Gupta, Alexey Brodkin
  Cc: linux-perf-users, peterz, dvhart, linux-kernel, acme, andi,
	dsahern, tglx, linux-snps-arc, Gilad Ben Yossef



________________________________________
>From: Vineet Gupta <Vineet.Gupta1@synopsys.com>
>Sent: Thursday, February 4, 2016 6:13 AM
>Noam, what's the atomic story for EZChip. Do you support such things for user
space in GNU tools. If -atomic is added to perf user space builds are you guys OK!

Well here for EZchip I also see the:
undefined reference to `__sync_add_and_fetch_4'
undefined reference to `__sync_sub_and_fetch_4'

This is since at file tools/include/asm/atomic.h we use the generic implementation
If for ARC I could use just like x86 my own header file then functions like:
atomic_inc()
atomic_dec_and_test()
Are easy to implement and you may see an example for such atomic methods in my patch set for the new platform.

You however wants to use some GCC flag -matomic which I assume somehow will implement the above __sync*.
I can't find the implementation but if it uses LLSC then it won't work for me since I am not supporting LLSC.

So seem that either I have my own header at kernel or that I need to change the GCC implementation for __sync* to use my atomic instructions.
I am personally tend to the x86 solution and not the generic one since changing GCC will require to have new compiler dependency.
 
-Noam

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
       [not found]                                   ` <20160205161027.GG28242@kernel.org>
@ 2016-02-10  3:09                                     ` Vineet Gupta
  0 siblings, 0 replies; 24+ messages in thread
From: Vineet Gupta @ 2016-02-10  3:09 UTC (permalink / raw)
  To: acme, Noam Camus
  Cc: peterz, Alexey Brodkin, linux-kernel, linux-perf-users, andi,
	dsahern, tglx, linux-snps-arc, dvhart, Gilad Ben Yossef

On Friday 05 February 2016 09:40 PM, acme@redhat.com wrote:
> Em Fri, Feb 05, 2016 at 11:18:52AM +0000, Noam Camus escreveu:
>> Well here for EZchip I also see the:
>> undefined reference to `__sync_add_and_fetch_4'
>> undefined reference to `__sync_sub_and_fetch_4'
> 
> Yeah, because there is no: tools/arch/arc/include/asm/atomic.h, can't
> you guys adapt arch/arc/include/asm/atomic.h to use in userspace?

Sure - however we need to support 3 variants: LLSC, !LLSC, EZCHIP

If needed, latter 2 could be done using a new atomic assist syscall

I presume kernel Kconfig items are no go in this header so this diversity
management needs to use toolchain defined macros e.g. __ezchip__


> 
> - Arnaldo
>  
>> This is since at file tools/include/asm/atomic.h we use the generic implementation
>> If for ARC I could use just like x86 my own header file then functions like:
>> atomic_inc()
>> atomic_dec_and_test()
>> Are easy to implement and you may see an example for such atomic methods in my patch set for the new platform.
>>
>> You however wants to use some GCC flag -matomic which I assume somehow will implement the above __sync*.
>> I can't find the implementation but if it uses LLSC then it won't work for me since I am not supporting LLSC.
> 
>> So seem that either I have my own header at kernel or that I need to
>> change the GCC implementation for __sync* to use my atomic
>> instructions.  I am personally tend to the x86 solution and not the
>> generic one since changing GCC will require to have new compiler
>> dependency.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] perf: fix building for ARCv1
  2015-10-19  9:35             ` Peter Zijlstra
  2015-10-19  9:46               ` Vineet Gupta
@ 2016-10-18 18:58               ` Vineet Gupta
  2016-10-24 16:17                 ` [PATCH-v2] ARC: syscall for userspace cmpxchg assist Vineet Gupta
  1 sibling, 1 reply; 24+ messages in thread
From: Vineet Gupta @ 2016-10-18 18:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alexey Brodkin, linux-kernel, linux-perf-users, Russell King,
	arcml, acme

On 10/19/2015 02:35 AM, Peter Zijlstra wrote:
> On Mon, Oct 19, 2015 at 09:28:43AM +0000, Vineet Gupta wrote:
>> On Monday 19 October 2015 11:20 AM, Andi Kleen wrote:
>>> Vineet Gupta <Vineet.Gupta1@synopsys.com> writes:
>>>> But this user space - so IMHO UP/SMP doesn't matter and we can't simulate them in
>>>> C just by itself.
>>> It matters when you access the perf ring buffer which is updated by kernel.
>> That's part of the problem. The issue is with atomic_* APIs proliferation in perf
>> user space code which assumes native atomix r-m-w support which is not always
>> true. So I think we still need a feature detection mechanism and if absent leave
>> the ball in arch court by calling arch_atomic_* which can use creative or half
>> working measures so perf will work to some extent atleast and not bomb outright.
>>
>> Also can u please elaborate a bit on "simulate them in C" - u mean just simple
>> unprotected LD, OP, ST or do u fancy usage of futex etc?
> Doesn't ARMv5 have a cmpxchg syscall to deal with this? It does an
> IRQ-disabled load-op-store sequence.

So I got around to addressing this - now that someone actually is trying to use
NPTL (which uses llock/scond) on ARC700 lacking those instructions. However given
that we are going this route, FWIW ARM kernel got rid of this syscall with
db695c0509d6ec ("ARM: remove user cmpxchg syscall") citing some security hole.
Even of we were to disregard, the code at the time had some open code MM trickery,
which I'd rather not replicate. My use case is simple - I only need to support UP
config - and a simple {get,put}_user would suffice - given that that can
potentially take a TLB refill Miss or worse still a full page fault. I'm going to
cook that patch to add that syscall, but wanted to get some thoughts ahead of that.

-Vineet

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH-v2] ARC: syscall for userspace cmpxchg assist
  2016-10-18 18:58               ` Vineet Gupta
@ 2016-10-24 16:17                 ` Vineet Gupta
  2016-11-04 20:16                   ` Vineet Gupta
  0 siblings, 1 reply; 24+ messages in thread
From: Vineet Gupta @ 2016-10-24 16:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-snps-arc, linux-kernel, Colin Ian King, Alexey.Brodkin,
	Arnd  Bergmann, Vineet Gupta

Older ARC700 cores (ARC750 specifically) lack instructions to implement
atomic r-w-w. This is problematic for userspace libraries such as NPTL
which need atomic primitives. So enable them by providing kernel assist.
This is costly but really the only sane soluton (othern than tight
spinning using the otherwise avaialble atomic exchange EX instruciton).

Good thing is there are only a few of these cores running Linux out in
the wild.

This only works on UP systems.

Reviewed-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
Changes since v1
 - errno not returned for access_ok() failing  [Colin]
 - Beefed up change log
 - WARN_ON_ONCE() for CONFIG_SMP since this is only UP safe
---
 arch/arc/include/asm/syscalls.h    |  1 +
 arch/arc/include/uapi/asm/unistd.h |  9 +++++----
 arch/arc/kernel/process.c          | 33 +++++++++++++++++++++++++++++++++
 3 files changed, 39 insertions(+), 4 deletions(-)

diff --git a/arch/arc/include/asm/syscalls.h b/arch/arc/include/asm/syscalls.h
index e56f9fcc5581..772b67ca56e7 100644
--- a/arch/arc/include/asm/syscalls.h
+++ b/arch/arc/include/asm/syscalls.h
@@ -17,6 +17,7 @@ int sys_clone_wrapper(int, int, int, int, int);
 int sys_cacheflush(uint32_t, uint32_t uint32_t);
 int sys_arc_settls(void *);
 int sys_arc_gettls(void);
+int sys_arc_usr_cmpxchg(int *, int, int);
 
 #include <asm-generic/syscalls.h>
 
diff --git a/arch/arc/include/uapi/asm/unistd.h b/arch/arc/include/uapi/asm/unistd.h
index 41fa2ec9e02c..9a34136d84b2 100644
--- a/arch/arc/include/uapi/asm/unistd.h
+++ b/arch/arc/include/uapi/asm/unistd.h
@@ -27,18 +27,19 @@
 
 #define NR_syscalls	__NR_syscalls
 
+/* Generic syscall (fs/filesystems.c - lost in asm-generic/unistd.h */
+#define __NR_sysfs		(__NR_arch_specific_syscall + 3)
+
 /* ARC specific syscall */
 #define __NR_cacheflush		(__NR_arch_specific_syscall + 0)
 #define __NR_arc_settls		(__NR_arch_specific_syscall + 1)
 #define __NR_arc_gettls		(__NR_arch_specific_syscall + 2)
+#define __NR_arc_usr_cmpxchg	(__NR_arch_specific_syscall + 4)
 
 __SYSCALL(__NR_cacheflush, sys_cacheflush)
 __SYSCALL(__NR_arc_settls, sys_arc_settls)
 __SYSCALL(__NR_arc_gettls, sys_arc_gettls)
-
-
-/* Generic syscall (fs/filesystems.c - lost in asm-generic/unistd.h */
-#define __NR_sysfs		(__NR_arch_specific_syscall + 3)
+__SYSCALL(__NR_arc_usr_cmpxchg, sys_arc_usr_cmpxchg)
 __SYSCALL(__NR_sysfs, sys_sysfs)
 
 #undef __SYSCALL
diff --git a/arch/arc/kernel/process.c b/arch/arc/kernel/process.c
index be1972bd2729..59aa43cb146e 100644
--- a/arch/arc/kernel/process.c
+++ b/arch/arc/kernel/process.c
@@ -41,6 +41,39 @@ SYSCALL_DEFINE0(arc_gettls)
 	return task_thread_info(current)->thr_ptr;
 }
 
+SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new)
+{
+	int uval;
+	int ret;
+
+	/*
+	 * This is only for old cores lacking LLOCK/SCOND, which by defintion
+	 * can't possibly be SMP. Thus doesn't need to be SMP safe.
+	 * And this also helps reduce the overhead for serializing in
+	 * the UP case
+	 */
+	WARN_ON_ONCE(IS_ENABLED(CONFIG_SMP));
+
+	if (!access_ok(VERIFY_WRITE, uaddr, sizeof(int)))
+		return -EFAULT;
+
+	preempt_disable();
+
+	ret = __get_user(uval, uaddr);
+	if (ret)
+		goto done;
+
+	if (uval != expected)
+		ret = -EAGAIN;
+	else
+		ret = __put_user(new, uaddr);
+
+done:
+	preempt_enable();
+
+	return ret;
+}
+
 void arch_cpu_idle(void)
 {
 	/* sleep, but enable all interrupts before committing */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH-v2] ARC: syscall for userspace cmpxchg assist
  2016-10-24 16:17                 ` [PATCH-v2] ARC: syscall for userspace cmpxchg assist Vineet Gupta
@ 2016-11-04 20:16                   ` Vineet Gupta
  2016-11-07 18:50                     ` [PATCH] ARC: tweak semantics of " Vineet Gupta
  0 siblings, 1 reply; 24+ messages in thread
From: Vineet Gupta @ 2016-11-04 20:16 UTC (permalink / raw)
  To: arcml
  Cc: Peter Zijlstra, linux-kernel, Colin Ian King, Alexey.Brodkin,
	Arnd Bergmann

On 10/24/2016 09:17 AM, Vineet Gupta wrote:
> Older ARC700 cores (ARC750 specifically) lack instructions to implement
> atomic r-w-w. This is problematic for userspace libraries such as NPTL
> which need atomic primitives. So enable them by providing kernel assist.
> This is costly but really the only sane soluton (othern than tight
> spinning using the otherwise avaialble atomic exchange EX instruciton).
> 
> Good thing is there are only a few of these cores running Linux out in
> the wild.
> 
> This only works on UP systems.
> 
> Reviewed-by: Colin Ian King <colin.king@canonical.com>
> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
> ---
> Changes since v1
>  - errno not returned for access_ok() failing  [Colin]
>  - Beefed up change log
>  - WARN_ON_ONCE() for CONFIG_SMP since this is only UP safe
> ---
>  arch/arc/include/asm/syscalls.h    |  1 +
>  arch/arc/include/uapi/asm/unistd.h |  9 +++++----
>  arch/arc/kernel/process.c          | 33 +++++++++++++++++++++++++++++++++
>  3 files changed, 39 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arc/include/asm/syscalls.h b/arch/arc/include/asm/syscalls.h
> index e56f9fcc5581..772b67ca56e7 100644
> --- a/arch/arc/include/asm/syscalls.h
> +++ b/arch/arc/include/asm/syscalls.h
> @@ -17,6 +17,7 @@ int sys_clone_wrapper(int, int, int, int, int);
>  int sys_cacheflush(uint32_t, uint32_t uint32_t);
>  int sys_arc_settls(void *);
>  int sys_arc_gettls(void);
> +int sys_arc_usr_cmpxchg(int *, int, int);
>  
>  #include <asm-generic/syscalls.h>
>  
> diff --git a/arch/arc/include/uapi/asm/unistd.h b/arch/arc/include/uapi/asm/unistd.h
> index 41fa2ec9e02c..9a34136d84b2 100644
> --- a/arch/arc/include/uapi/asm/unistd.h
> +++ b/arch/arc/include/uapi/asm/unistd.h
> @@ -27,18 +27,19 @@
>  
>  #define NR_syscalls	__NR_syscalls
>  
> +/* Generic syscall (fs/filesystems.c - lost in asm-generic/unistd.h */
> +#define __NR_sysfs		(__NR_arch_specific_syscall + 3)
> +
>  /* ARC specific syscall */
>  #define __NR_cacheflush		(__NR_arch_specific_syscall + 0)
>  #define __NR_arc_settls		(__NR_arch_specific_syscall + 1)
>  #define __NR_arc_gettls		(__NR_arch_specific_syscall + 2)
> +#define __NR_arc_usr_cmpxchg	(__NR_arch_specific_syscall + 4)
>  
>  __SYSCALL(__NR_cacheflush, sys_cacheflush)
>  __SYSCALL(__NR_arc_settls, sys_arc_settls)
>  __SYSCALL(__NR_arc_gettls, sys_arc_gettls)
> -
> -
> -/* Generic syscall (fs/filesystems.c - lost in asm-generic/unistd.h */
> -#define __NR_sysfs		(__NR_arch_specific_syscall + 3)
> +__SYSCALL(__NR_arc_usr_cmpxchg, sys_arc_usr_cmpxchg)
>  __SYSCALL(__NR_sysfs, sys_sysfs)
>  
>  #undef __SYSCALL
> diff --git a/arch/arc/kernel/process.c b/arch/arc/kernel/process.c
> index be1972bd2729..59aa43cb146e 100644
> --- a/arch/arc/kernel/process.c
> +++ b/arch/arc/kernel/process.c
> @@ -41,6 +41,39 @@ SYSCALL_DEFINE0(arc_gettls)
>  	return task_thread_info(current)->thr_ptr;
>  }
>  
> +SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new)
> +{
> +	int uval;
> +	int ret;
> +
> +	/*
> +	 * This is only for old cores lacking LLOCK/SCOND, which by defintion
> +	 * can't possibly be SMP. Thus doesn't need to be SMP safe.
> +	 * And this also helps reduce the overhead for serializing in
> +	 * the UP case
> +	 */
> +	WARN_ON_ONCE(IS_ENABLED(CONFIG_SMP));
> +
> +	if (!access_ok(VERIFY_WRITE, uaddr, sizeof(int)))
> +		return -EFAULT;
> +
> +	preempt_disable();
> +
> +	ret = __get_user(uval, uaddr);
> +	if (ret)
> +		goto done;
> +
> +	if (uval != expected)
> +		ret = -EAGAIN;
> +	else
> +		ret = __put_user(new, uaddr);
> +
> +done:
> +	preempt_enable();
> +
> +	return ret;
> +}

It seems there is a subtle issue with this interface. Userspace cares more about
"prev" value to be able to build it's own state machine(s) - my existing uclibc
code was flawed as it was tight looping on the errno result.

We can add a return param, by passing a pointer, but I think it would be better
(and slightly cheaper) to just ditch the errno and simply return the prev value
which and current value could be checked for success/fail decision etc.

-Vineet

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH] ARC: tweak semantics of userspace cmpxchg assist
  2016-11-04 20:16                   ` Vineet Gupta
@ 2016-11-07 18:50                     ` Vineet Gupta
  0 siblings, 0 replies; 24+ messages in thread
From: Vineet Gupta @ 2016-11-07 18:50 UTC (permalink / raw)
  To: linux-snps-arc; +Cc: Alexey.Brodkin, linux-kernel, Vineet Gupta

The original code only used to return errno to indicate if cmpxchg
succeeded. It was not returning the "previous" value which typical cmpxhg
callers are interested in to build their slowpaths or retry loops.
Given user preemption in syscall return path etc, it is not wise to
check this in userspace afterwards, but should what kernel actually
observed in the syscall.

So change the syscall interface to always return the previous value and
additionally set Z flag to indicate whether operation succeeded or not
(just like ARM implementation when they used ot have this syscall)
The flag approach avoids having to put_user errno and specially when
the use case for this syscall cares mostly about the "previous" value.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/include/asm/arcregs.h |  2 ++
 arch/arc/kernel/process.c      | 20 +++++++++++---------
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/arch/arc/include/asm/arcregs.h b/arch/arc/include/asm/arcregs.h
index 7f3f9f63708c..1bd24ec3e350 100644
--- a/arch/arc/include/asm/arcregs.h
+++ b/arch/arc/include/asm/arcregs.h
@@ -43,12 +43,14 @@
 #define STATUS_AE_BIT		5	/* Exception active */
 #define STATUS_DE_BIT		6	/* PC is in delay slot */
 #define STATUS_U_BIT		7	/* User/Kernel mode */
+#define STATUS_Z_BIT            11
 #define STATUS_L_BIT		12	/* Loop inhibit */
 
 /* These masks correspond to the status word(STATUS_32) bits */
 #define STATUS_AE_MASK		(1<<STATUS_AE_BIT)
 #define STATUS_DE_MASK		(1<<STATUS_DE_BIT)
 #define STATUS_U_MASK		(1<<STATUS_U_BIT)
+#define STATUS_Z_MASK		(1<<STATUS_Z_BIT)
 #define STATUS_L_MASK		(1<<STATUS_L_BIT)
 
 /*
diff --git a/arch/arc/kernel/process.c b/arch/arc/kernel/process.c
index 59aa43cb146e..a41a79a4f4fe 100644
--- a/arch/arc/kernel/process.c
+++ b/arch/arc/kernel/process.c
@@ -43,8 +43,8 @@ SYSCALL_DEFINE0(arc_gettls)
 
 SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new)
 {
-	int uval;
-	int ret;
+	struct pt_regs *regs = current_pt_regs();
+	int uval = -EFAULT;
 
 	/*
 	 * This is only for old cores lacking LLOCK/SCOND, which by defintion
@@ -54,24 +54,26 @@ SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new)
 	 */
 	WARN_ON_ONCE(IS_ENABLED(CONFIG_SMP));
 
+	/* Z indicates to userspace if operation succeded */
+	regs->status32 &= ~STATUS_Z_MASK;
+
 	if (!access_ok(VERIFY_WRITE, uaddr, sizeof(int)))
 		return -EFAULT;
 
 	preempt_disable();
 
-	ret = __get_user(uval, uaddr);
-	if (ret)
+	if (__get_user(uval, uaddr))
 		goto done;
 
-	if (uval != expected)
-		ret = -EAGAIN;
-	else
-		ret = __put_user(new, uaddr);
+	if (uval == expected) {
+		if (!__put_user(new, uaddr))
+			regs->status32 |= STATUS_Z_MASK;
+	}
 
 done:
 	preempt_enable();
 
-	return ret;
+	return uval;
 }
 
 void arch_cpu_idle(void)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2016-11-07 18:50 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1445088959-3058-1-git-send-email-abrodkin@synopsys.com>
2015-10-17 14:19 ` [RFC] perf: fix building for ARCv1 Vineet Gupta
2015-10-18 11:15   ` Alexey Brodkin
2015-10-18 23:14     ` Andi Kleen
2015-10-19  4:58       ` Vineet Gupta
2015-10-19  5:49         ` Andi Kleen
2015-10-19  9:28           ` Vineet Gupta
2015-10-19  9:35             ` Peter Zijlstra
2015-10-19  9:46               ` Vineet Gupta
2015-10-19  9:51                 ` Peter Zijlstra
2015-10-19 10:04                   ` Vineet Gupta
2015-10-20  8:00                   ` Vineet Gupta
2015-10-20 10:11                     ` Peter Zijlstra
2015-10-20 10:45                       ` Vineet Gupta
2015-10-29 15:58                         ` Alexey Brodkin
2015-10-30  6:19                           ` Vineet Gupta
2016-02-03 16:20                             ` Alexey Brodkin
     [not found]                             ` <1454516455.2811.4.camel__10775.5710989752$1454516490$gmane$org@synopsys.com>
2016-02-04  4:13                               ` Vineet Gupta
2016-02-05 11:18                                 ` Noam Camus
     [not found]                                   ` <20160205161027.GG28242@kernel.org>
2016-02-10  3:09                                     ` Vineet Gupta
2016-10-18 18:58               ` Vineet Gupta
2016-10-24 16:17                 ` [PATCH-v2] ARC: syscall for userspace cmpxchg assist Vineet Gupta
2016-11-04 20:16                   ` Vineet Gupta
2016-11-07 18:50                     ` [PATCH] ARC: tweak semantics of " Vineet Gupta
2015-10-30  6:21 ` [RFC] perf: fix building for ARCv1 Vineet Gupta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).