* Re: [RFC] perf: fix building for ARCv1 [not found] <1445088959-3058-1-git-send-email-abrodkin@synopsys.com> @ 2015-10-17 14:19 ` Vineet Gupta 2015-10-18 11:15 ` Alexey Brodkin 2015-10-30 6:21 ` [RFC] perf: fix building for ARCv1 Vineet Gupta 1 sibling, 1 reply; 24+ messages in thread From: Vineet Gupta @ 2015-10-17 14:19 UTC (permalink / raw) To: Alexey Brodkin Cc: Peter Zijlstra, Aabid Rushdi, lkml, linux-perf-users, Darren Hart On Saturday 17 October 2015 07:06 PM, Alexey Brodkin wrote: > Perf uses atomic options and so it is required to have atomics enabled > in toolchain. > > In case of ARC atomics are enabled by default for ARCv2 but disabled for > ARCv1. Now we explicitly enable atomics for either ARC achitecture > version so perf could be successfully built. > > Currently on attempt to build perf for ARCv1 you'll see tons of: > ----------------->8----------------- > undefined reference to `__sync_add_and_fetch_4' > ----------------->8----------------- > > Still note if ARCv1 CPU is configured without LL/SC perf will crash on > execution once "llock" instruction is attempted to be executed. Ok this fixes ARCompact - assuming it will have LL/SC. We do have old SoCs w/o that support. So what we are saying is that any arch (or a configuration thereof) which doesn't support atomic r-m-w can't even build perf now - that sucks ! A better way would be to do feature test for __sync_xyz and make atomic_xxx wrappers call __sync_xyz) vs. an empty stub. So atleast such arches can build and do "some" perf work ! -Vineet > Cc: Vineet Gupta <vgupta@synopsys.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Arnaldo Carvalho de Melo <acme@kernel.org> > Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> > --- > tools/perf/config/Makefile | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile > index 38a0853..dc7c0a8 100644 > --- a/tools/perf/config/Makefile > +++ b/tools/perf/config/Makefile > @@ -47,6 +47,11 @@ ifeq ($(ARCH),arm64) > LIBUNWIND_LIBS = -lunwind -lunwind-aarch64 > endif > > +# Additional ARCH settings for ARC > +ifeq ($(ARCH),arc) > + CFLAGS += -matomic > +endif > + > ifeq ($(NO_PERF_REGS),0) > $(call detected,CONFIG_PERF_REGS) > endif ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2015-10-17 14:19 ` [RFC] perf: fix building for ARCv1 Vineet Gupta @ 2015-10-18 11:15 ` Alexey Brodkin 2015-10-18 23:14 ` Andi Kleen 0 siblings, 1 reply; 24+ messages in thread From: Alexey Brodkin @ 2015-10-18 11:15 UTC (permalink / raw) To: Vineet Gupta Cc: Aabid Rushdi, linux-kernel, peterz, linux-perf-users, dvhart, dsahern, acme [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 3287 bytes --] Hi Vineet, Looks like this time atomics are a must. And that really sucks! See these commits that introduce usage of atomic_xxx() all around the perf and tools it uses: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f812d3045c2385ac16237e68b156859c4005526e http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d3a7c489c7fd2463e3b2c3a2179c7be879dd9cb4 http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7143849a5d6a5c623d81790d92f0033507c5b14f http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=59a51c1dc9fbb3fb4af928b852d7b35df83edd74 http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e1ed3a5b87ed6759e16ec93f16aae83d2cc77ca2 and that's the one that introduced usage of the following generic gcc's atomics (__sync_add_and_fetch/__sync_sub_and_fetch): http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=da6d8567512df11e0473b710c07de87efde5709c So the best we may do is to implement detection of atomics in the toolchain and if there's no atomics hard stop with perf building. -Alexey On Sat, 2015-10-17 at 14:19 +0000, Vineet Gupta wrote: > On Saturday 17 October 2015 07:06 PM, Alexey Brodkin wrote: > > Perf uses atomic options and so it is required to have atomics enabled > > in toolchain. > > > > In case of ARC atomics are enabled by default for ARCv2 but disabled for > > ARCv1. Now we explicitly enable atomics for either ARC achitecture > > version so perf could be successfully built. > > > > Currently on attempt to build perf for ARCv1 you'll see tons of: > > ----------------->8----------------- > > undefined reference to `__sync_add_and_fetch_4' > > ----------------->8----------------- > > > > Still note if ARCv1 CPU is configured without LL/SC perf will crash on > > execution once "llock" instruction is attempted to be executed. > > Ok this fixes ARCompact - assuming it will have LL/SC. We do have old SoCs w/o > that support. > So what we are saying is that any arch (or a configuration thereof) which doesn't > support atomic r-m-w can't even build perf now - that sucks ! > > A better way would be to do feature test for __sync_xyz and make atomic_xxx > wrappers call __sync_xyz) vs. an empty stub. > So atleast such arches can build and do "some" perf work ! > > -Vineet > > > Cc: Vineet Gupta <vgupta@synopsys.com> > > Cc: Peter Zijlstra <peterz@infradead.org> > > Cc: Arnaldo Carvalho de Melo <acme@kernel.org> > > Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> > > --- > > tools/perf/config/Makefile | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile > > index 38a0853..dc7c0a8 100644 > > --- a/tools/perf/config/Makefile > > +++ b/tools/perf/config/Makefile > > @@ -47,6 +47,11 @@ ifeq ($(ARCH),arm64) > > LIBUNWIND_LIBS = -lunwind -lunwind-aarch64 > > endif > > > > +# Additional ARCH settings for ARC > > +ifeq ($(ARCH),arc) > > + CFLAGS += -matomic > > +endif > > + > > ifeq ($(NO_PERF_REGS),0) > > $(call detected,CONFIG_PERF_REGS) > > endif > > ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2015-10-18 11:15 ` Alexey Brodkin @ 2015-10-18 23:14 ` Andi Kleen 2015-10-19 4:58 ` Vineet Gupta 0 siblings, 1 reply; 24+ messages in thread From: Andi Kleen @ 2015-10-18 23:14 UTC (permalink / raw) To: Alexey Brodkin Cc: Vineet Gupta, Aabid Rushdi, linux-kernel, peterz, linux-perf-users, dvhart, dsahern, acme Alexey Brodkin <Alexey.Brodkin@synopsys.com> writes: > > So the best we may do is to implement detection of atomics in the toolchain and if there's no atomics hard stop with > perf building. If your target is single cpu only you can always simulate them in C. -Andi ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2015-10-18 23:14 ` Andi Kleen @ 2015-10-19 4:58 ` Vineet Gupta 2015-10-19 5:49 ` Andi Kleen 0 siblings, 1 reply; 24+ messages in thread From: Vineet Gupta @ 2015-10-19 4:58 UTC (permalink / raw) To: Andi Kleen, Alexey Brodkin Cc: Aabid Rushdi, linux-kernel, peterz, linux-perf-users, dvhart, dsahern, acme On Monday 19 October 2015 04:45 AM, Andi Kleen wrote: > Alexey Brodkin <Alexey.Brodkin@synopsys.com> writes: >> So the best we may do is to implement detection of atomics in the toolchain and if there's no atomics hard stop with >> perf building. > If your target is single cpu only you can always simulate them in C. > > -Andi But this user space - so IMHO UP/SMP doesn't matter and we can't simulate them in C just by itself. -Vineet ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2015-10-19 4:58 ` Vineet Gupta @ 2015-10-19 5:49 ` Andi Kleen 2015-10-19 9:28 ` Vineet Gupta 0 siblings, 1 reply; 24+ messages in thread From: Andi Kleen @ 2015-10-19 5:49 UTC (permalink / raw) To: Vineet Gupta Cc: Alexey Brodkin, Aabid Rushdi, linux-kernel, peterz, linux-perf-users, dvhart, dsahern, acme Vineet Gupta <Vineet.Gupta1@synopsys.com> writes: > > But this user space - so IMHO UP/SMP doesn't matter and we can't simulate them in > C just by itself. It matters when you access the perf ring buffer which is updated by kernel. Also perf is now multi threaded to some degree. -Andi ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2015-10-19 5:49 ` Andi Kleen @ 2015-10-19 9:28 ` Vineet Gupta 2015-10-19 9:35 ` Peter Zijlstra 0 siblings, 1 reply; 24+ messages in thread From: Vineet Gupta @ 2015-10-19 9:28 UTC (permalink / raw) To: Andi Kleen Cc: Alexey Brodkin, Aabid Rushdi, linux-kernel, peterz, linux-perf-users, dvhart, dsahern, acme On Monday 19 October 2015 11:20 AM, Andi Kleen wrote: > Vineet Gupta <Vineet.Gupta1@synopsys.com> writes: >> But this user space - so IMHO UP/SMP doesn't matter and we can't simulate them in >> C just by itself. > It matters when you access the perf ring buffer which is updated by kernel. That's part of the problem. The issue is with atomic_* APIs proliferation in perf user space code which assumes native atomix r-m-w support which is not always true. So I think we still need a feature detection mechanism and if absent leave the ball in arch court by calling arch_atomic_* which can use creative or half working measures so perf will work to some extent atleast and not bomb outright. Also can u please elaborate a bit on "simulate them in C" - u mean just simple unprotected LD, OP, ST or do u fancy usage of futex etc? > Also perf is now multi threaded to some degree. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2015-10-19 9:28 ` Vineet Gupta @ 2015-10-19 9:35 ` Peter Zijlstra 2015-10-19 9:46 ` Vineet Gupta 2016-10-18 18:58 ` Vineet Gupta 0 siblings, 2 replies; 24+ messages in thread From: Peter Zijlstra @ 2015-10-19 9:35 UTC (permalink / raw) To: Vineet Gupta Cc: Andi Kleen, Alexey Brodkin, Aabid Rushdi, linux-kernel, linux-perf-users, dvhart, dsahern, acme On Mon, Oct 19, 2015 at 09:28:43AM +0000, Vineet Gupta wrote: > On Monday 19 October 2015 11:20 AM, Andi Kleen wrote: > > Vineet Gupta <Vineet.Gupta1@synopsys.com> writes: > >> But this user space - so IMHO UP/SMP doesn't matter and we can't simulate them in > >> C just by itself. > > It matters when you access the perf ring buffer which is updated by kernel. > > That's part of the problem. The issue is with atomic_* APIs proliferation in perf > user space code which assumes native atomix r-m-w support which is not always > true. So I think we still need a feature detection mechanism and if absent leave > the ball in arch court by calling arch_atomic_* which can use creative or half > working measures so perf will work to some extent atleast and not bomb outright. > > Also can u please elaborate a bit on "simulate them in C" - u mean just simple > unprotected LD, OP, ST or do u fancy usage of futex etc? Doesn't ARMv5 have a cmpxchg syscall to deal with this? It does an IRQ-disabled load-op-store sequence. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2015-10-19 9:35 ` Peter Zijlstra @ 2015-10-19 9:46 ` Vineet Gupta 2015-10-19 9:51 ` Peter Zijlstra 2016-10-18 18:58 ` Vineet Gupta 1 sibling, 1 reply; 24+ messages in thread From: Vineet Gupta @ 2015-10-19 9:46 UTC (permalink / raw) To: Peter Zijlstra Cc: Andi Kleen, Alexey Brodkin, Aabid Rushdi, linux-kernel, linux-perf-users, dvhart, dsahern, acme On Monday 19 October 2015 03:05 PM, Peter Zijlstra wrote: > On Mon, Oct 19, 2015 at 09:28:43AM +0000, Vineet Gupta wrote: >> > On Monday 19 October 2015 11:20 AM, Andi Kleen wrote: >>> > > Vineet Gupta <Vineet.Gupta1@synopsys.com> writes: >>>> > >> But this user space - so IMHO UP/SMP doesn't matter and we can't simulate them in >>>> > >> C just by itself. >>> > > It matters when you access the perf ring buffer which is updated by kernel. >> > >> > That's part of the problem. The issue is with atomic_* APIs proliferation in perf >> > user space code which assumes native atomix r-m-w support which is not always >> > true. So I think we still need a feature detection mechanism and if absent leave >> > the ball in arch court by calling arch_atomic_* which can use creative or half >> > working measures so perf will work to some extent atleast and not bomb outright. >> > >> > Also can u please elaborate a bit on "simulate them in C" - u mean just simple >> > unprotected LD, OP, ST or do u fancy usage of futex etc? > Doesn't ARMv5 have a cmpxchg syscall to deal with this? It does an > IRQ-disabled load-op-store sequence. Yeah I remember seeing some syscall like that in ARM. On ARC we could use the atomic EXchange to implement a user space only binary semaphore - these atomic ops will be small duration so it is OK to spin wait for a little bit. That's how the old pthread library worked for ARC w/o any atomic support. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2015-10-19 9:46 ` Vineet Gupta @ 2015-10-19 9:51 ` Peter Zijlstra 2015-10-19 10:04 ` Vineet Gupta 2015-10-20 8:00 ` Vineet Gupta 0 siblings, 2 replies; 24+ messages in thread From: Peter Zijlstra @ 2015-10-19 9:51 UTC (permalink / raw) To: Vineet Gupta Cc: Andi Kleen, Alexey Brodkin, Aabid Rushdi, linux-kernel, linux-perf-users, dvhart, dsahern, acme On Mon, Oct 19, 2015 at 09:46:35AM +0000, Vineet Gupta wrote: > On ARC we could use the atomic EXchange to implement a user space only binary > semaphore - these atomic ops will be small duration so it is OK to spin wait for a > little bit. That's how the old pthread library worked for ARC w/o any atomic support. That has the obvious problem of lock-holder-preemption and the horrible performance issues that result from that. I think the syscall at least has deterministic behaviour, whereas that userspace spin loop has this abysmal worst case thing. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2015-10-19 9:51 ` Peter Zijlstra @ 2015-10-19 10:04 ` Vineet Gupta 2015-10-20 8:00 ` Vineet Gupta 1 sibling, 0 replies; 24+ messages in thread From: Vineet Gupta @ 2015-10-19 10:04 UTC (permalink / raw) To: Peter Zijlstra Cc: Andi Kleen, Alexey Brodkin, Aabid Rushdi, linux-kernel, linux-perf-users, dvhart, dsahern, acme On Monday 19 October 2015 03:22 PM, Peter Zijlstra wrote: > On Mon, Oct 19, 2015 at 09:46:35AM +0000, Vineet Gupta wrote: >> > On ARC we could use the atomic EXchange to implement a user space only binary >> > semaphore - these atomic ops will be small duration so it is OK to spin wait for a >> > little bit. That's how the old pthread library worked for ARC w/o any atomic support. > That has the obvious problem of lock-holder-preemption and the horrible > performance issues that result from that. > > I think the syscall at least has deterministic behaviour, whereas that > userspace spin loop has this abysmal worst case thing. I agree - we can add that syscall trivially and use it based on build time feature detection for atomics ! ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2015-10-19 9:51 ` Peter Zijlstra 2015-10-19 10:04 ` Vineet Gupta @ 2015-10-20 8:00 ` Vineet Gupta 2015-10-20 10:11 ` Peter Zijlstra 1 sibling, 1 reply; 24+ messages in thread From: Vineet Gupta @ 2015-10-20 8:00 UTC (permalink / raw) To: Peter Zijlstra, Noam Camus Cc: Andi Kleen, Alexey Brodkin, Gilad Ben Yossef, linux-kernel, linux-perf-users, dvhart, dsahern, acme On Monday 19 October 2015 03:22 PM, Peter Zijlstra wrote: > On Mon, Oct 19, 2015 at 09:46:35AM +0000, Vineet Gupta wrote: >> On ARC we could use the atomic EXchange to implement a user space only binary >> semaphore - these atomic ops will be small duration so it is OK to spin wait for a >> little bit. That's how the old pthread library worked for ARC w/o any atomic support. > That has the obvious problem of lock-holder-preemption and the horrible > performance issues that result from that. > > I think the syscall at least has deterministic behaviour, whereas that > userspace spin loop has this abysmal worst case thing. I don't have issue with adding the syscall per-se. But that comes with it's own headaches of ABI change - more importantly it requires several things to match, libc, kernel... It would be easier if change was confined to say perf. Can we use existing syscall(s) - again this is what our good old pthread library code did. static void __pthread_acquire(int * spinlock) { int cnt = 0; struct timespec tm; READ_MEMORY_BARRIER(); while (testandset(spinlock)) { <---- atomic EXchange if (cnt < 50) { sched_yield(); cnt++; } else { tm.tv_sec = 0; tm.tv_nsec = 2000001; nanosleep(&tm, ((void *)0)); cnt = 0; } } ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2015-10-20 8:00 ` Vineet Gupta @ 2015-10-20 10:11 ` Peter Zijlstra 2015-10-20 10:45 ` Vineet Gupta 0 siblings, 1 reply; 24+ messages in thread From: Peter Zijlstra @ 2015-10-20 10:11 UTC (permalink / raw) To: Vineet Gupta Cc: Noam Camus, Andi Kleen, Alexey Brodkin, Gilad Ben Yossef, linux-kernel, linux-perf-users, dvhart, dsahern, acme, Thomas Gleixner On Tue, Oct 20, 2015 at 08:00:46AM +0000, Vineet Gupta wrote: > On Monday 19 October 2015 03:22 PM, Peter Zijlstra wrote: > > On Mon, Oct 19, 2015 at 09:46:35AM +0000, Vineet Gupta wrote: > >> On ARC we could use the atomic EXchange to implement a user space only binary > >> semaphore - these atomic ops will be small duration so it is OK to spin wait for a > >> little bit. That's how the old pthread library worked for ARC w/o any atomic support. > > That has the obvious problem of lock-holder-preemption and the horrible > > performance issues that result from that. > > > > I think the syscall at least has deterministic behaviour, whereas that > > userspace spin loop has this abysmal worst case thing. > > I don't have issue with adding the syscall per-se. But that comes with it's own > headaches of ABI change - more importantly it requires several things to match, > libc, kernel... It would be easier if change was confined to say perf. OTOH fixing all those would get you a 'sane' system :-) > Can we use existing syscall(s) - again this is what our good old pthread library > code did. > > static void __pthread_acquire(int * spinlock) > { > int cnt = 0; > struct timespec tm; > > READ_MEMORY_BARRIER(); > > while (testandset(spinlock)) { <---- atomic EXchange > if (cnt < 50) { > sched_yield(); > cnt++; > } else { > tm.tv_sec = 0; > tm.tv_nsec = 2000001; > nanosleep(&tm, ((void *)0)); > cnt = 0; > } > } *shudder* that is quite horrible. This means all your 'atomics' are broken for anything SCHED_FIFO and the like. You simply _cannot_ run a realtime system. (also, for ACQUIRE you want the READ_MEMORY_BARRIER() _after_ the test-and-set control dependency.) But its your arch.. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2015-10-20 10:11 ` Peter Zijlstra @ 2015-10-20 10:45 ` Vineet Gupta 2015-10-29 15:58 ` Alexey Brodkin 0 siblings, 1 reply; 24+ messages in thread From: Vineet Gupta @ 2015-10-20 10:45 UTC (permalink / raw) To: Peter Zijlstra Cc: Noam Camus, Andi Kleen, Alexey Brodkin, Gilad Ben Yossef, linux-kernel, linux-perf-users, dvhart, dsahern, acme, Thomas Gleixner On Tuesday 20 October 2015 03:41 PM, Peter Zijlstra wrote: >> > Can we use existing syscall(s) - again this is what our good old pthread library >> > code did. >> > >> > static void __pthread_acquire(int * spinlock) >> > { >> > int cnt = 0; >> > struct timespec tm; >> > >> > READ_MEMORY_BARRIER(); >> > >> > while (testandset(spinlock)) { <---- atomic EXchange >> > if (cnt < 50) { >> > sched_yield(); >> > cnt++; >> > } else { >> > tm.tv_sec = 0; >> > tm.tv_nsec = 2000001; >> > nanosleep(&tm, ((void *)0)); >> > cnt = 0; >> > } >> > } > *shudder* that is quite horrible. > > This means all your 'atomics' are broken for anything SCHED_FIFO and the > like. You simply _cannot_ run a realtime system. The code above is from uClibc old threading library which we don't use anymore. The NPTL version doesn't have all of this song-n-dance and relies on futexes. The change we are talking about is only for the atomics in perf itself. I do understand your POV though. > (also, for ACQUIRE you want the READ_MEMORY_BARRIER() _after_ the > test-and-set control dependency.) Absolutely and in this case it will have to be added both inside the loop and one at the end to cover both the scenarios ! ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2015-10-20 10:45 ` Vineet Gupta @ 2015-10-29 15:58 ` Alexey Brodkin 2015-10-30 6:19 ` Vineet Gupta 0 siblings, 1 reply; 24+ messages in thread From: Alexey Brodkin @ 2015-10-29 15:58 UTC (permalink / raw) To: Vineet Gupta Cc: linux-kernel, peterz, tglx, andi, linux-snps-arc, acme, linux-perf-users, giladb, dsahern, dvhart, noamc [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1769 bytes --] Hi Vineet, On Tue, 2015-10-20 at 10:45 +0000, Vineet Gupta wrote: > On Tuesday 20 October 2015 03:41 PM, Peter Zijlstra wrote: > > > > Can we use existing syscall(s) - again this is what our good old pthread library > > > > code did. > > > > > > > > static void __pthread_acquire(int * spinlock) > > > > { > > > > int cnt = 0; > > > > struct timespec tm; > > > > > > > > READ_MEMORY_BARRIER(); > > > > > > > > while (testandset(spinlock)) { <---- atomic EXchange > > > > if (cnt < 50) { > > > > sched_yield(); > > > > cnt++; > > > > } else { > > > > tm.tv_sec = 0; > > > > tm.tv_nsec = 2000001; > > > > nanosleep(&tm, ((void *)0)); > > > > cnt = 0; > > > > } > > > > } > > *shudder* that is quite horrible. > > > > This means all your 'atomics' are broken for anything SCHED_FIFO and the > > like. You simply _cannot_ run a realtime system. > > The code above is from uClibc old threading library which we don't use anymore. > The NPTL version doesn't have all of this song-n-dance and relies on futexes. The > change we are talking about is only for the atomics in perf itself. I do > understand your POV though. > > > (also, for ACQUIRE you want the READ_MEMORY_BARRIER() _after_ the > > test-and-set control dependency.) > > Absolutely and in this case it will have to be added both inside the loop and one > at the end to cover both the scenarios ! > I'm wondering what are our plans for now? Are we going to accept proposed fix just for ARC in 4.4 (and to all stables then) or we'll try to come up with more general solution? -Alexeyÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2015-10-29 15:58 ` Alexey Brodkin @ 2015-10-30 6:19 ` Vineet Gupta 2016-02-03 16:20 ` Alexey Brodkin [not found] ` <1454516455.2811.4.camel__10775.5710989752$1454516490$gmane$org@synopsys.com> 0 siblings, 2 replies; 24+ messages in thread From: Vineet Gupta @ 2015-10-30 6:19 UTC (permalink / raw) To: Alexey Brodkin Cc: linux-kernel, peterz, tglx, andi, linux-snps-arc, acme, linux-perf-users, giladb, dsahern, dvhart, noamc On Thursday 29 October 2015 09:28 PM, Alexey Brodkin wrote: > Hi Vineet, > > On Tue, 2015-10-20 at 10:45 +0000, Vineet Gupta wrote: >> On Tuesday 20 October 2015 03:41 PM, Peter Zijlstra wrote: >>>>> Can we use existing syscall(s) - again this is what our good old pthread library >>>>> code did. >>>>> >>>>> static void __pthread_acquire(int * spinlock) >>>>> { >>>>> int cnt = 0; >>>>> struct timespec tm; >>>>> >>>>> READ_MEMORY_BARRIER(); >>>>> >>>>> while (testandset(spinlock)) { <---- atomic EXchange >>>>> if (cnt < 50) { >>>>> sched_yield(); >>>>> cnt++; >>>>> } else { >>>>> tm.tv_sec = 0; >>>>> tm.tv_nsec = 2000001; >>>>> nanosleep(&tm, ((void *)0)); >>>>> cnt = 0; >>>>> } >>>>> } >>> *shudder* that is quite horrible. >>> >>> This means all your 'atomics' are broken for anything SCHED_FIFO and the >>> like. You simply _cannot_ run a realtime system. >> The code above is from uClibc old threading library which we don't use anymore. >> The NPTL version doesn't have all of this song-n-dance and relies on futexes. The >> change we are talking about is only for the atomics in perf itself. I do >> understand your POV though. >> >>> (also, for ACQUIRE you want the READ_MEMORY_BARRIER() _after_ the >>> test-and-set control dependency.) >> Absolutely and in this case it will have to be added both inside the loop and one >> at the end to cover both the scenarios ! >> > I'm wondering what are our plans for now? > Are we going to accept proposed fix just for ARC in 4.4 (and to all stables then) > or we'll try to come up with more general solution? I agree with the current solution to add -atomic to for arc700 builds. Although making that default for arc700 tools will be better but that will not fix things before next release of tools etc. But we *do* need to improve generic solution 1. Add atomics detection in perf to add fall back arch stubs 2. ARC needs to add syscall for facilitating atomic r-m-w ! -Vineet ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2015-10-30 6:19 ` Vineet Gupta @ 2016-02-03 16:20 ` Alexey Brodkin [not found] ` <1454516455.2811.4.camel__10775.5710989752$1454516490$gmane$org@synopsys.com> 1 sibling, 0 replies; 24+ messages in thread From: Alexey Brodkin @ 2016-02-03 16:20 UTC (permalink / raw) To: Vineet Gupta Cc: linux-kernel, peterz, tglx, andi, linux-snps-arc, acme, linux-perf-users, giladb, dsahern, dvhart, noamc Hi Vineet, On Fri, 2015-10-30 at 06:19 +0000, Vineet Gupta wrote: > On Thursday 29 October 2015 09:28 PM, Alexey Brodkin wrote: > > Hi Vineet, > > > > On Tue, 2015-10-20 at 10:45 +0000, Vineet Gupta wrote: > > > On Tuesday 20 October 2015 03:41 PM, Peter Zijlstra wrote: > > > > > > Can we use existing syscall(s) - again this is what our good old pthread library > > > > > > code did. > > > > > > > > > > > > static void __pthread_acquire(int * spinlock) > > > > > > { > > > > > > int cnt = 0; > > > > > > struct timespec tm; > > > > > > > > > > > > READ_MEMORY_BARRIER(); > > > > > > > > > > > > while (testandset(spinlock)) { <---- atomic EXchange > > > > > > if (cnt < 50) { > > > > > > sched_yield(); > > > > > > cnt++; > > > > > > } else { > > > > > > tm.tv_sec = 0; > > > > > > tm.tv_nsec = 2000001; > > > > > > nanosleep(&tm, ((void *)0)); > > > > > > cnt = 0; > > > > > > } > > > > > > } > > > > *shudder* that is quite horrible. > > > > > > > > This means all your 'atomics' are broken for anything SCHED_FIFO and the > > > > like. You simply _cannot_ run a realtime system. > > > The code above is from uClibc old threading library which we don't use anymore. > > > The NPTL version doesn't have all of this song-n-dance and relies on futexes. The > > > change we are talking about is only for the atomics in perf itself. I do > > > understand your POV though. > > > > > > > (also, for ACQUIRE you want the READ_MEMORY_BARRIER() _after_ the > > > > test-and-set control dependency.) > > > Absolutely and in this case it will have to be added both inside the loop and one > > > at the end to cover both the scenarios ! > > > > > I'm wondering what are our plans for now? > > Are we going to accept proposed fix just for ARC in 4.4 (and to all stables then) > > or we'll try to come up with more general solution? > > I agree with the current solution to add -atomic to for arc700 builds. > Although making that default for arc700 tools will be better but that will not fix > things before next release of tools etc. > > But we *do* need to improve generic solution > 1. Add atomics detection in perf to add fall back arch stubs > 2. ARC needs to add syscall for facilitating atomic r-m-w ! So the most recent ARC GNU tools (2015.12) were just released yesterday and still atomics are disabled by default for ARCv1. That means perf will continue to fail on building for now. Do you think we may apply my initial fix to 4.5 while it is in development and then to stable trees as well? -Alexey ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <1454516455.2811.4.camel__10775.5710989752$1454516490$gmane$org@synopsys.com>]
* Re: [RFC] perf: fix building for ARCv1 [not found] ` <1454516455.2811.4.camel__10775.5710989752$1454516490$gmane$org@synopsys.com> @ 2016-02-04 4:13 ` Vineet Gupta 2016-02-05 11:18 ` Noam Camus 0 siblings, 1 reply; 24+ messages in thread From: Vineet Gupta @ 2016-02-04 4:13 UTC (permalink / raw) To: Alexey Brodkin, Noam Camus Cc: linux-perf-users, peterz, dvhart, linux-kernel, acme, andi, noamc, dsahern, tglx, linux-snps-arc, giladb +CC Noam On Wednesday 03 February 2016 09:50 PM, Alexey Brodkin wrote: >> I agree with the current solution to add -atomic to for arc700 builds. >> > Although making that default for arc700 tools will be better but that will not fix >> > things before next release of tools etc. >> > >> > But we *do* need to improve generic solution >> > 1. Add atomics detection in perf to add fall back arch stubs >> > 2. ARC needs to add syscall for facilitating atomic r-m-w ! > So the most recent ARC GNU tools (2015.12) were just released yesterday > and still atomics are disabled by default for ARCv1. > That means perf will continue to fail on building for now. > > Do you think we may apply my initial fix to 4.5 while it is in development and > then to stable trees as well? Noam, what's the atomic story for EZChip. Do you support such things for user space in GNU tools. If -atomic is added to perf user space builds are you guys OK! -Vineet ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2016-02-04 4:13 ` Vineet Gupta @ 2016-02-05 11:18 ` Noam Camus [not found] ` <20160205161027.GG28242@kernel.org> 0 siblings, 1 reply; 24+ messages in thread From: Noam Camus @ 2016-02-05 11:18 UTC (permalink / raw) To: Vineet Gupta, Alexey Brodkin Cc: linux-perf-users, peterz, dvhart, linux-kernel, acme, andi, dsahern, tglx, linux-snps-arc, Gilad Ben Yossef ________________________________________ >From: Vineet Gupta <Vineet.Gupta1@synopsys.com> >Sent: Thursday, February 4, 2016 6:13 AM >Noam, what's the atomic story for EZChip. Do you support such things for user space in GNU tools. If -atomic is added to perf user space builds are you guys OK! Well here for EZchip I also see the: undefined reference to `__sync_add_and_fetch_4' undefined reference to `__sync_sub_and_fetch_4' This is since at file tools/include/asm/atomic.h we use the generic implementation If for ARC I could use just like x86 my own header file then functions like: atomic_inc() atomic_dec_and_test() Are easy to implement and you may see an example for such atomic methods in my patch set for the new platform. You however wants to use some GCC flag -matomic which I assume somehow will implement the above __sync*. I can't find the implementation but if it uses LLSC then it won't work for me since I am not supporting LLSC. So seem that either I have my own header at kernel or that I need to change the GCC implementation for __sync* to use my atomic instructions. I am personally tend to the x86 solution and not the generic one since changing GCC will require to have new compiler dependency. -Noam ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <20160205161027.GG28242@kernel.org>]
* Re: [RFC] perf: fix building for ARCv1 [not found] ` <20160205161027.GG28242@kernel.org> @ 2016-02-10 3:09 ` Vineet Gupta 0 siblings, 0 replies; 24+ messages in thread From: Vineet Gupta @ 2016-02-10 3:09 UTC (permalink / raw) To: acme, Noam Camus Cc: peterz, Alexey Brodkin, linux-kernel, linux-perf-users, andi, dsahern, tglx, linux-snps-arc, dvhart, Gilad Ben Yossef On Friday 05 February 2016 09:40 PM, acme@redhat.com wrote: > Em Fri, Feb 05, 2016 at 11:18:52AM +0000, Noam Camus escreveu: >> Well here for EZchip I also see the: >> undefined reference to `__sync_add_and_fetch_4' >> undefined reference to `__sync_sub_and_fetch_4' > > Yeah, because there is no: tools/arch/arc/include/asm/atomic.h, can't > you guys adapt arch/arc/include/asm/atomic.h to use in userspace? Sure - however we need to support 3 variants: LLSC, !LLSC, EZCHIP If needed, latter 2 could be done using a new atomic assist syscall I presume kernel Kconfig items are no go in this header so this diversity management needs to use toolchain defined macros e.g. __ezchip__ > > - Arnaldo > >> This is since at file tools/include/asm/atomic.h we use the generic implementation >> If for ARC I could use just like x86 my own header file then functions like: >> atomic_inc() >> atomic_dec_and_test() >> Are easy to implement and you may see an example for such atomic methods in my patch set for the new platform. >> >> You however wants to use some GCC flag -matomic which I assume somehow will implement the above __sync*. >> I can't find the implementation but if it uses LLSC then it won't work for me since I am not supporting LLSC. > >> So seem that either I have my own header at kernel or that I need to >> change the GCC implementation for __sync* to use my atomic >> instructions. I am personally tend to the x86 solution and not the >> generic one since changing GCC will require to have new compiler >> dependency. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 2015-10-19 9:35 ` Peter Zijlstra 2015-10-19 9:46 ` Vineet Gupta @ 2016-10-18 18:58 ` Vineet Gupta 2016-10-24 16:17 ` [PATCH-v2] ARC: syscall for userspace cmpxchg assist Vineet Gupta 1 sibling, 1 reply; 24+ messages in thread From: Vineet Gupta @ 2016-10-18 18:58 UTC (permalink / raw) To: Peter Zijlstra Cc: Alexey Brodkin, linux-kernel, linux-perf-users, Russell King, arcml, acme On 10/19/2015 02:35 AM, Peter Zijlstra wrote: > On Mon, Oct 19, 2015 at 09:28:43AM +0000, Vineet Gupta wrote: >> On Monday 19 October 2015 11:20 AM, Andi Kleen wrote: >>> Vineet Gupta <Vineet.Gupta1@synopsys.com> writes: >>>> But this user space - so IMHO UP/SMP doesn't matter and we can't simulate them in >>>> C just by itself. >>> It matters when you access the perf ring buffer which is updated by kernel. >> That's part of the problem. The issue is with atomic_* APIs proliferation in perf >> user space code which assumes native atomix r-m-w support which is not always >> true. So I think we still need a feature detection mechanism and if absent leave >> the ball in arch court by calling arch_atomic_* which can use creative or half >> working measures so perf will work to some extent atleast and not bomb outright. >> >> Also can u please elaborate a bit on "simulate them in C" - u mean just simple >> unprotected LD, OP, ST or do u fancy usage of futex etc? > Doesn't ARMv5 have a cmpxchg syscall to deal with this? It does an > IRQ-disabled load-op-store sequence. So I got around to addressing this - now that someone actually is trying to use NPTL (which uses llock/scond) on ARC700 lacking those instructions. However given that we are going this route, FWIW ARM kernel got rid of this syscall with db695c0509d6ec ("ARM: remove user cmpxchg syscall") citing some security hole. Even of we were to disregard, the code at the time had some open code MM trickery, which I'd rather not replicate. My use case is simple - I only need to support UP config - and a simple {get,put}_user would suffice - given that that can potentially take a TLB refill Miss or worse still a full page fault. I'm going to cook that patch to add that syscall, but wanted to get some thoughts ahead of that. -Vineet ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH-v2] ARC: syscall for userspace cmpxchg assist 2016-10-18 18:58 ` Vineet Gupta @ 2016-10-24 16:17 ` Vineet Gupta 2016-11-04 20:16 ` Vineet Gupta 0 siblings, 1 reply; 24+ messages in thread From: Vineet Gupta @ 2016-10-24 16:17 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-snps-arc, linux-kernel, Colin Ian King, Alexey.Brodkin, Arnd Bergmann, Vineet Gupta Older ARC700 cores (ARC750 specifically) lack instructions to implement atomic r-w-w. This is problematic for userspace libraries such as NPTL which need atomic primitives. So enable them by providing kernel assist. This is costly but really the only sane soluton (othern than tight spinning using the otherwise avaialble atomic exchange EX instruciton). Good thing is there are only a few of these cores running Linux out in the wild. This only works on UP systems. Reviewed-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> --- Changes since v1 - errno not returned for access_ok() failing [Colin] - Beefed up change log - WARN_ON_ONCE() for CONFIG_SMP since this is only UP safe --- arch/arc/include/asm/syscalls.h | 1 + arch/arc/include/uapi/asm/unistd.h | 9 +++++---- arch/arc/kernel/process.c | 33 +++++++++++++++++++++++++++++++++ 3 files changed, 39 insertions(+), 4 deletions(-) diff --git a/arch/arc/include/asm/syscalls.h b/arch/arc/include/asm/syscalls.h index e56f9fcc5581..772b67ca56e7 100644 --- a/arch/arc/include/asm/syscalls.h +++ b/arch/arc/include/asm/syscalls.h @@ -17,6 +17,7 @@ int sys_clone_wrapper(int, int, int, int, int); int sys_cacheflush(uint32_t, uint32_t uint32_t); int sys_arc_settls(void *); int sys_arc_gettls(void); +int sys_arc_usr_cmpxchg(int *, int, int); #include <asm-generic/syscalls.h> diff --git a/arch/arc/include/uapi/asm/unistd.h b/arch/arc/include/uapi/asm/unistd.h index 41fa2ec9e02c..9a34136d84b2 100644 --- a/arch/arc/include/uapi/asm/unistd.h +++ b/arch/arc/include/uapi/asm/unistd.h @@ -27,18 +27,19 @@ #define NR_syscalls __NR_syscalls +/* Generic syscall (fs/filesystems.c - lost in asm-generic/unistd.h */ +#define __NR_sysfs (__NR_arch_specific_syscall + 3) + /* ARC specific syscall */ #define __NR_cacheflush (__NR_arch_specific_syscall + 0) #define __NR_arc_settls (__NR_arch_specific_syscall + 1) #define __NR_arc_gettls (__NR_arch_specific_syscall + 2) +#define __NR_arc_usr_cmpxchg (__NR_arch_specific_syscall + 4) __SYSCALL(__NR_cacheflush, sys_cacheflush) __SYSCALL(__NR_arc_settls, sys_arc_settls) __SYSCALL(__NR_arc_gettls, sys_arc_gettls) - - -/* Generic syscall (fs/filesystems.c - lost in asm-generic/unistd.h */ -#define __NR_sysfs (__NR_arch_specific_syscall + 3) +__SYSCALL(__NR_arc_usr_cmpxchg, sys_arc_usr_cmpxchg) __SYSCALL(__NR_sysfs, sys_sysfs) #undef __SYSCALL diff --git a/arch/arc/kernel/process.c b/arch/arc/kernel/process.c index be1972bd2729..59aa43cb146e 100644 --- a/arch/arc/kernel/process.c +++ b/arch/arc/kernel/process.c @@ -41,6 +41,39 @@ SYSCALL_DEFINE0(arc_gettls) return task_thread_info(current)->thr_ptr; } +SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new) +{ + int uval; + int ret; + + /* + * This is only for old cores lacking LLOCK/SCOND, which by defintion + * can't possibly be SMP. Thus doesn't need to be SMP safe. + * And this also helps reduce the overhead for serializing in + * the UP case + */ + WARN_ON_ONCE(IS_ENABLED(CONFIG_SMP)); + + if (!access_ok(VERIFY_WRITE, uaddr, sizeof(int))) + return -EFAULT; + + preempt_disable(); + + ret = __get_user(uval, uaddr); + if (ret) + goto done; + + if (uval != expected) + ret = -EAGAIN; + else + ret = __put_user(new, uaddr); + +done: + preempt_enable(); + + return ret; +} + void arch_cpu_idle(void) { /* sleep, but enable all interrupts before committing */ -- 2.7.4 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH-v2] ARC: syscall for userspace cmpxchg assist 2016-10-24 16:17 ` [PATCH-v2] ARC: syscall for userspace cmpxchg assist Vineet Gupta @ 2016-11-04 20:16 ` Vineet Gupta 2016-11-07 18:50 ` [PATCH] ARC: tweak semantics of " Vineet Gupta 0 siblings, 1 reply; 24+ messages in thread From: Vineet Gupta @ 2016-11-04 20:16 UTC (permalink / raw) To: arcml Cc: Peter Zijlstra, linux-kernel, Colin Ian King, Alexey.Brodkin, Arnd Bergmann On 10/24/2016 09:17 AM, Vineet Gupta wrote: > Older ARC700 cores (ARC750 specifically) lack instructions to implement > atomic r-w-w. This is problematic for userspace libraries such as NPTL > which need atomic primitives. So enable them by providing kernel assist. > This is costly but really the only sane soluton (othern than tight > spinning using the otherwise avaialble atomic exchange EX instruciton). > > Good thing is there are only a few of these cores running Linux out in > the wild. > > This only works on UP systems. > > Reviewed-by: Colin Ian King <colin.king@canonical.com> > Signed-off-by: Vineet Gupta <vgupta@synopsys.com> > --- > Changes since v1 > - errno not returned for access_ok() failing [Colin] > - Beefed up change log > - WARN_ON_ONCE() for CONFIG_SMP since this is only UP safe > --- > arch/arc/include/asm/syscalls.h | 1 + > arch/arc/include/uapi/asm/unistd.h | 9 +++++---- > arch/arc/kernel/process.c | 33 +++++++++++++++++++++++++++++++++ > 3 files changed, 39 insertions(+), 4 deletions(-) > > diff --git a/arch/arc/include/asm/syscalls.h b/arch/arc/include/asm/syscalls.h > index e56f9fcc5581..772b67ca56e7 100644 > --- a/arch/arc/include/asm/syscalls.h > +++ b/arch/arc/include/asm/syscalls.h > @@ -17,6 +17,7 @@ int sys_clone_wrapper(int, int, int, int, int); > int sys_cacheflush(uint32_t, uint32_t uint32_t); > int sys_arc_settls(void *); > int sys_arc_gettls(void); > +int sys_arc_usr_cmpxchg(int *, int, int); > > #include <asm-generic/syscalls.h> > > diff --git a/arch/arc/include/uapi/asm/unistd.h b/arch/arc/include/uapi/asm/unistd.h > index 41fa2ec9e02c..9a34136d84b2 100644 > --- a/arch/arc/include/uapi/asm/unistd.h > +++ b/arch/arc/include/uapi/asm/unistd.h > @@ -27,18 +27,19 @@ > > #define NR_syscalls __NR_syscalls > > +/* Generic syscall (fs/filesystems.c - lost in asm-generic/unistd.h */ > +#define __NR_sysfs (__NR_arch_specific_syscall + 3) > + > /* ARC specific syscall */ > #define __NR_cacheflush (__NR_arch_specific_syscall + 0) > #define __NR_arc_settls (__NR_arch_specific_syscall + 1) > #define __NR_arc_gettls (__NR_arch_specific_syscall + 2) > +#define __NR_arc_usr_cmpxchg (__NR_arch_specific_syscall + 4) > > __SYSCALL(__NR_cacheflush, sys_cacheflush) > __SYSCALL(__NR_arc_settls, sys_arc_settls) > __SYSCALL(__NR_arc_gettls, sys_arc_gettls) > - > - > -/* Generic syscall (fs/filesystems.c - lost in asm-generic/unistd.h */ > -#define __NR_sysfs (__NR_arch_specific_syscall + 3) > +__SYSCALL(__NR_arc_usr_cmpxchg, sys_arc_usr_cmpxchg) > __SYSCALL(__NR_sysfs, sys_sysfs) > > #undef __SYSCALL > diff --git a/arch/arc/kernel/process.c b/arch/arc/kernel/process.c > index be1972bd2729..59aa43cb146e 100644 > --- a/arch/arc/kernel/process.c > +++ b/arch/arc/kernel/process.c > @@ -41,6 +41,39 @@ SYSCALL_DEFINE0(arc_gettls) > return task_thread_info(current)->thr_ptr; > } > > +SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new) > +{ > + int uval; > + int ret; > + > + /* > + * This is only for old cores lacking LLOCK/SCOND, which by defintion > + * can't possibly be SMP. Thus doesn't need to be SMP safe. > + * And this also helps reduce the overhead for serializing in > + * the UP case > + */ > + WARN_ON_ONCE(IS_ENABLED(CONFIG_SMP)); > + > + if (!access_ok(VERIFY_WRITE, uaddr, sizeof(int))) > + return -EFAULT; > + > + preempt_disable(); > + > + ret = __get_user(uval, uaddr); > + if (ret) > + goto done; > + > + if (uval != expected) > + ret = -EAGAIN; > + else > + ret = __put_user(new, uaddr); > + > +done: > + preempt_enable(); > + > + return ret; > +} It seems there is a subtle issue with this interface. Userspace cares more about "prev" value to be able to build it's own state machine(s) - my existing uclibc code was flawed as it was tight looping on the errno result. We can add a return param, by passing a pointer, but I think it would be better (and slightly cheaper) to just ditch the errno and simply return the prev value which and current value could be checked for success/fail decision etc. -Vineet ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH] ARC: tweak semantics of userspace cmpxchg assist 2016-11-04 20:16 ` Vineet Gupta @ 2016-11-07 18:50 ` Vineet Gupta 0 siblings, 0 replies; 24+ messages in thread From: Vineet Gupta @ 2016-11-07 18:50 UTC (permalink / raw) To: linux-snps-arc; +Cc: Alexey.Brodkin, linux-kernel, Vineet Gupta The original code only used to return errno to indicate if cmpxchg succeeded. It was not returning the "previous" value which typical cmpxhg callers are interested in to build their slowpaths or retry loops. Given user preemption in syscall return path etc, it is not wise to check this in userspace afterwards, but should what kernel actually observed in the syscall. So change the syscall interface to always return the previous value and additionally set Z flag to indicate whether operation succeeded or not (just like ARM implementation when they used ot have this syscall) The flag approach avoids having to put_user errno and specially when the use case for this syscall cares mostly about the "previous" value. Signed-off-by: Vineet Gupta <vgupta@synopsys.com> --- arch/arc/include/asm/arcregs.h | 2 ++ arch/arc/kernel/process.c | 20 +++++++++++--------- 2 files changed, 13 insertions(+), 9 deletions(-) diff --git a/arch/arc/include/asm/arcregs.h b/arch/arc/include/asm/arcregs.h index 7f3f9f63708c..1bd24ec3e350 100644 --- a/arch/arc/include/asm/arcregs.h +++ b/arch/arc/include/asm/arcregs.h @@ -43,12 +43,14 @@ #define STATUS_AE_BIT 5 /* Exception active */ #define STATUS_DE_BIT 6 /* PC is in delay slot */ #define STATUS_U_BIT 7 /* User/Kernel mode */ +#define STATUS_Z_BIT 11 #define STATUS_L_BIT 12 /* Loop inhibit */ /* These masks correspond to the status word(STATUS_32) bits */ #define STATUS_AE_MASK (1<<STATUS_AE_BIT) #define STATUS_DE_MASK (1<<STATUS_DE_BIT) #define STATUS_U_MASK (1<<STATUS_U_BIT) +#define STATUS_Z_MASK (1<<STATUS_Z_BIT) #define STATUS_L_MASK (1<<STATUS_L_BIT) /* diff --git a/arch/arc/kernel/process.c b/arch/arc/kernel/process.c index 59aa43cb146e..a41a79a4f4fe 100644 --- a/arch/arc/kernel/process.c +++ b/arch/arc/kernel/process.c @@ -43,8 +43,8 @@ SYSCALL_DEFINE0(arc_gettls) SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new) { - int uval; - int ret; + struct pt_regs *regs = current_pt_regs(); + int uval = -EFAULT; /* * This is only for old cores lacking LLOCK/SCOND, which by defintion @@ -54,24 +54,26 @@ SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new) */ WARN_ON_ONCE(IS_ENABLED(CONFIG_SMP)); + /* Z indicates to userspace if operation succeded */ + regs->status32 &= ~STATUS_Z_MASK; + if (!access_ok(VERIFY_WRITE, uaddr, sizeof(int))) return -EFAULT; preempt_disable(); - ret = __get_user(uval, uaddr); - if (ret) + if (__get_user(uval, uaddr)) goto done; - if (uval != expected) - ret = -EAGAIN; - else - ret = __put_user(new, uaddr); + if (uval == expected) { + if (!__put_user(new, uaddr)) + regs->status32 |= STATUS_Z_MASK; + } done: preempt_enable(); - return ret; + return uval; } void arch_cpu_idle(void) -- 2.7.4 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [RFC] perf: fix building for ARCv1 [not found] <1445088959-3058-1-git-send-email-abrodkin@synopsys.com> 2015-10-17 14:19 ` [RFC] perf: fix building for ARCv1 Vineet Gupta @ 2015-10-30 6:21 ` Vineet Gupta 1 sibling, 0 replies; 24+ messages in thread From: Vineet Gupta @ 2015-10-30 6:21 UTC (permalink / raw) To: Alexey Brodkin Cc: Peter Zijlstra, linux-perf-users, Arnaldo Carvalho de Melo, lkml, linux-snps-arc On Saturday 17 October 2015 07:06 PM, Alexey Brodkin wrote: > Perf uses atomic options and so it is required to have atomics enabled > in toolchain. > > In case of ARC atomics are enabled by default for ARCv2 but disabled for > ARCv1. Now we explicitly enable atomics for either ARC achitecture > version so perf could be successfully built. > > Currently on attempt to build perf for ARCv1 you'll see tons of: > ----------------->8----------------- > undefined reference to `__sync_add_and_fetch_4' > ----------------->8----------------- > > Still note if ARCv1 CPU is configured without LL/SC perf will crash on > execution once "llock" instruction is attempted to be executed. > > Cc: Vineet Gupta <vgupta@synopsys.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Arnaldo Carvalho de Melo <acme@kernel.org> > Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Acked-by: Vineet Gupta <vgupta@synopsys.com> > --- > tools/perf/config/Makefile | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile > index 38a0853..dc7c0a8 100644 > --- a/tools/perf/config/Makefile > +++ b/tools/perf/config/Makefile > @@ -47,6 +47,11 @@ ifeq ($(ARCH),arm64) > LIBUNWIND_LIBS = -lunwind -lunwind-aarch64 > endif > > +# Additional ARCH settings for ARC > +ifeq ($(ARCH),arc) > + CFLAGS += -matomic > +endif > + > ifeq ($(NO_PERF_REGS),0) > $(call detected,CONFIG_PERF_REGS) > endif ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2016-11-07 18:50 UTC | newest] Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <1445088959-3058-1-git-send-email-abrodkin@synopsys.com> 2015-10-17 14:19 ` [RFC] perf: fix building for ARCv1 Vineet Gupta 2015-10-18 11:15 ` Alexey Brodkin 2015-10-18 23:14 ` Andi Kleen 2015-10-19 4:58 ` Vineet Gupta 2015-10-19 5:49 ` Andi Kleen 2015-10-19 9:28 ` Vineet Gupta 2015-10-19 9:35 ` Peter Zijlstra 2015-10-19 9:46 ` Vineet Gupta 2015-10-19 9:51 ` Peter Zijlstra 2015-10-19 10:04 ` Vineet Gupta 2015-10-20 8:00 ` Vineet Gupta 2015-10-20 10:11 ` Peter Zijlstra 2015-10-20 10:45 ` Vineet Gupta 2015-10-29 15:58 ` Alexey Brodkin 2015-10-30 6:19 ` Vineet Gupta 2016-02-03 16:20 ` Alexey Brodkin [not found] ` <1454516455.2811.4.camel__10775.5710989752$1454516490$gmane$org@synopsys.com> 2016-02-04 4:13 ` Vineet Gupta 2016-02-05 11:18 ` Noam Camus [not found] ` <20160205161027.GG28242@kernel.org> 2016-02-10 3:09 ` Vineet Gupta 2016-10-18 18:58 ` Vineet Gupta 2016-10-24 16:17 ` [PATCH-v2] ARC: syscall for userspace cmpxchg assist Vineet Gupta 2016-11-04 20:16 ` Vineet Gupta 2016-11-07 18:50 ` [PATCH] ARC: tweak semantics of " Vineet Gupta 2015-10-30 6:21 ` [RFC] perf: fix building for ARCv1 Vineet Gupta
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).