* Mipsel libc with LL/SC online anywhere?
@ 2002-07-12 13:04 ` Kevin D. Kissell
0 siblings, 0 replies; 12+ messages in thread
From: Kevin D. Kissell @ 2002-07-12 13:04 UTC (permalink / raw)
To: linux-mips
I'm benchmarking some code that does lots of
semaphores, and with the libc from the "standard"
MIPS/SGI RH 7.1 distribution, those are done using
sysmips, in the interest of universality. Regardles of
whether and how the ongoing argument of How Things
Should Be is settled, is there a copy of an up-to-date
glibc package built to use ll/sc out there on anyone's
FTP or web server? I suppose I could extract and
replace the necessary routines by hand, but that would
be slow and fraught with the risk of error...
Regards,
Kevin K.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Mipsel libc with LL/SC online anywhere?
@ 2002-07-12 13:04 ` Kevin D. Kissell
0 siblings, 0 replies; 12+ messages in thread
From: Kevin D. Kissell @ 2002-07-12 13:04 UTC (permalink / raw)
To: linux-mips
I'm benchmarking some code that does lots of
semaphores, and with the libc from the "standard"
MIPS/SGI RH 7.1 distribution, those are done using
sysmips, in the interest of universality. Regardles of
whether and how the ongoing argument of How Things
Should Be is settled, is there a copy of an up-to-date
glibc package built to use ll/sc out there on anyone's
FTP or web server? I suppose I could extract and
replace the necessary routines by hand, but that would
be slow and fraught with the risk of error...
Regards,
Kevin K.
^ permalink raw reply [flat|nested] 12+ messages in thread
* LL/SC benchmarking [was: Mipsel libc with LL/SC online anywhere?]
2002-07-12 13:04 ` Kevin D. Kissell
(?)
@ 2002-07-19 12:38 ` Johannes Stezenbach
2002-07-19 15:54 ` Richard Hodges
2002-07-25 16:25 ` Johannes Stezenbach
-1 siblings, 2 replies; 12+ messages in thread
From: Johannes Stezenbach @ 2002-07-19 12:38 UTC (permalink / raw)
To: Kevin D. Kissell; +Cc: linux-mips
On Fri, Jul 12, 2002 at 03:04:07PM +0200, Kevin D. Kissell wrote:
> I'm benchmarking some code that does lots of
> semaphores, and with the libc from the "standard"
> MIPS/SGI RH 7.1 distribution, those are done using
> sysmips, in the interest of universality.
I'm working on a platform without LL/SC, an embedded system/SOC
with a NEC VR4120A CPU core. To find out the effect of sysmips
vs. emulated LL/SC vs. the branch-likely trick posted by
Kevin D. Kissell <kevink@mips.com> on Tue, 22 Jan 2002 18:16:25 +0100
I created an experimental patch for glibc-2.2.5 which allows
run-time switching of the _test_and_set() and __compare_and_swap()
implementation based on the presence of two "switch files" in /etc/.
Despite its ugliness, I include the patch below for those interested.
(Note: I built my glibc with -mips2, so the patch lacks .set mips2
directives.)
One thing that caused me some headaches was that the __compare_and_swap()
implementation in glibc-2.2.5 is broken (but fixed in glibc CVS and H.J.Lu's
patch).
For lack of a better benchmark I used some of the examples from
glibc-2.2.5/linuxthreads/Examples. The numbers are from the third
of three successive runs of 'time exN >/dev/null'.
sysmips:
ex1 real 0m0.273s user 0m0.040s sys 0m0.230s
ex2 real 0m10.911s user 0m2.730s sys 0m8.180s
ex3 real 0m3.648s user 0m3.400s sys 0m0.250s
ex5 real 0m4.539s user 0m1.830s sys 0m2.710s
ll/sc emulation:
ex1 real 0m0.272s user 0m0.020s sys 0m0.250s
ex2 real 0m4.726s user 0m1.660s sys 0m3.060s
ex3 real 0m3.968s user 0m3.750s sys 0m0.220s
ex5 real 0m4.069s user 0m1.710s sys 0m2.360s
beql-hack:
ex1 real 0m0.268s user 0m0.010s sys 0m0.260s
ex2 real 0m3.988s user 0m1.620s sys 0m2.360s
ex3 real 0m3.965s user 0m3.740s sys 0m0.220s
ex5 real 0m2.606s user 0m1.000s sys 0m1.600s
I think the poor performance of sysmips is caused by the absence of
__compare_and_swap(), which forces libpthread to use less efficient
implementations for semaphore and lock functions.
Running each of the four tests three times yields around one million
LL/SC emulations in /proc/cpuinfo.
I think the beql-hack needs a kernel patch to guarantee k1 !=
MAGIC_COOKIE after each eret, but for a those few tests I was just
taking my chance.
Next, I'm trying to run the pthread tests from LTP. If someone
has a better benchmark code for pthread performance, I'm interested.
Regards,
Johannes
diff -uarN glibc-2.2.5.orig/linuxthreads/sysdeps/mips/pspinlock.c glibc-2.2.5/linuxthreads/sysdeps/mips/pspinlock.c
--- glibc-2.2.5.orig/linuxthreads/sysdeps/mips/pspinlock.c Thu Jul 18 14:28:07 2002
+++ glibc-2.2.5/linuxthreads/sysdeps/mips/pspinlock.c Thu Jul 18 18:35:46 2002
@@ -23,7 +23,93 @@
#include <sys/tas.h>
#include "internals.h"
-#if (_MIPS_ISA >= _MIPS_ISA_MIPS2)
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <unistd.h>
+
+static int
+_compare_and_swap_mips2 (long int *p, long int oldval, long int newval)
+{
+ long int ret;
+
+ __asm__ __volatile__
+ ("/* Inline compare & swap */\n\t"
+ "1:\n\t"
+ "ll %0,%4\n\t"
+ ".set push\n"
+ ".set noreorder\n\t"
+ "bne %0,%2,2f\n\t"
+ "move %0,$0\n\t"
+ "move %0,%3\n\t"
+ ".set pop\n\t"
+ "sc %0,%1\n\t"
+ "beqz %0,1b\n"
+ "2:\n\t"
+ "/* End compare & swap */"
+ : "=&r" (ret), "=m" (*p)
+ : "r" (oldval), "r" (newval), "m" (*p)
+ : "memory");
+
+ return ret;
+}
+
+static int
+_compare_and_swap_mips2_nollsc (long int *p, long int oldval, long int newval)
+{
+ long int r, t;
+
+ __asm__ __volatile__
+ (".set push\n\t"
+ ".set noreorder\n\t"
+ "li %1,0xffaaffaa\n\t" /* MAGIC_COOKIE */
+ "1:\n\t"
+ "move $27,%1\n\t" /* set k1 */
+ "lw %0,%5\n\t" /* r = *p */
+ "bne %0,%3,3f\n\t" /* if (r != oldval) return 0 */
+ "move %0,$0\n\t" /* r = 0 */
+ "move %0,%4\n\t" /* r = newval */
+ "beql $27,%1,2f\n\t" /* test k1 for change */
+ "sw %0,%2\n\t" /* *p = r; return 1 */
+ "b 1b\n\t" /* k1 changed, retry */
+ "nop\n\t"
+ ".set pop\n\t"
+ "2:\n"
+ "li %0,1\n\t" /* r = 1 */
+ "3:\n"
+ : "=&r" (r), "=&r" (t), "=m" (*p)
+ : "r" (oldval), "r" (newval), "m" (*p)
+ : "memory");
+
+ return r;
+}
+
+int
+compare_and_swap_is_available (void)
+{
+ int fp;
+ /* FIXME: write real test */
+ if ((fp =open ("/etc/mips2_cpu_without_llsc", O_RDONLY)) != -1)
+ {
+ close(fp);
+ _mips_compare_and_swap = _compare_and_swap_mips2_nollsc;
+ return 1;
+ }
+ if ((fp =open ("/etc/mips2_cpu_with_llsc", O_RDONLY)) != -1)
+ {
+ close(fp);
+ _mips_compare_and_swap = _compare_and_swap_mips2;
+ return 1;
+ }
+ return 0;
+}
+
+int (* _mips_compare_and_swap) (long int *p, long int oldval, long int newval)
+ = NULL;
+
+
+#if 0 && (_MIPS_ISA >= _MIPS_ISA_MIPS2)
+ /* don't nother, no one uses this... */
/* This implementation is similar to the one used in the Linux kernel. */
int
diff -uarN glibc-2.2.5.orig/linuxthreads/sysdeps/mips/pt-machine.h glibc-2.2.5/linuxthreads/sysdeps/mips/pt-machine.h
--- glibc-2.2.5.orig/linuxthreads/sysdeps/mips/pt-machine.h Thu Jul 18 14:28:13 2002
+++ glibc-2.2.5/linuxthreads/sysdeps/mips/pt-machine.h Thu Jul 18 16:27:15 2002
@@ -33,41 +33,11 @@
/* Spinlock implementation; required. */
-#if (_MIPS_ISA >= _MIPS_ISA_MIPS2)
-
-PT_EI long int
-testandset (int *spinlock)
-{
- long int ret, temp;
-
- __asm__ __volatile__
- ("/* Inline spinlock test & set */\n\t"
- "1:\n\t"
- "ll %0,%3\n\t"
- ".set push\n\t"
- ".set noreorder\n\t"
- "bnez %0,2f\n\t"
- " li %1,1\n\t"
- ".set pop\n\t"
- "sc %1,%2\n\t"
- "beqz %1,1b\n"
- "2:\n\t"
- "/* End spinlock test & set */"
- : "=&r" (ret), "=&r" (temp), "=m" (*spinlock)
- : "m" (*spinlock)
- : "memory");
-
- return ret;
-}
-
-#else /* !(_MIPS_ISA >= _MIPS_ISA_MIPS2) */
-
PT_EI long int
testandset (int *spinlock)
{
return _test_and_set (spinlock, 1);
}
-#endif /* !(_MIPS_ISA >= _MIPS_ISA_MIPS2) */
/* Get some notion of the current stack. Need not be exactly the top
@@ -78,32 +48,13 @@
/* Compare-and-swap for semaphores. */
-#if (_MIPS_ISA >= _MIPS_ISA_MIPS2)
-
#define HAS_COMPARE_AND_SWAP
+#define TEST_FOR_COMPARE_AND_SWAP
+extern int (* _mips_compare_and_swap) (long int *p, long int oldval, long int newval);
+extern int compare_and_swap_is_available (void);
+
PT_EI int
__compare_and_swap (long int *p, long int oldval, long int newval)
{
- long int ret;
-
- __asm__ __volatile__
- ("/* Inline compare & swap */\n\t"
- "1:\n\t"
- "ll %0,%4\n\t"
- ".set push\n"
- ".set noreorder\n\t"
- "bne %0,%2,2f\n\t"
- " move %0,%3\n\t"
- ".set pop\n\t"
- "sc %0,%1\n\t"
- "beqz %0,1b\n"
- "2:\n\t"
- "/* End compare & swap */"
- : "=&r" (ret), "=m" (*p)
- : "r" (oldval), "r" (newval), "m" (*p)
- : "memory");
-
- return ret;
+ return _mips_compare_and_swap (p, oldval, newval);
}
-
-#endif /* (_MIPS_ISA >= _MIPS_ISA_MIPS2) */
diff -uarN glibc-2.2.5.orig/sysdeps/unix/sysv/linux/mips/_test_and_set.c glibc-2.2.5/sysdeps/unix/sysv/linux/mips/_test_and_set.c
--- glibc-2.2.5.orig/sysdeps/unix/sysv/linux/mips/_test_and_set.c Thu Jul 18 00:21:15 2002
+++ glibc-2.2.5/sysdeps/unix/sysv/linux/mips/_test_and_set.c Thu Jul 18 14:39:01 2002
@@ -21,6 +21,12 @@
defined in sys/tas.h */
#include <features.h>
+#include <sgidefs.h>
+#include <sys/sysmips.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <unistd.h>
#define _EXTERN_INLINE
#ifndef __USE_EXTERN_INLINES
@@ -28,3 +34,80 @@
#endif
#include "sys/tas.h"
+
+
+static int
+_test_and_set_mips2_nollsc (int *p, int v) __THROW
+{
+ int r, t;
+
+ __asm__ __volatile__
+ (".set push\n\t"
+ ".set noreorder\n\t"
+ "li %1,0xffaaffaa\n\t" /* MAGIC_COOKIE */
+ "1:\n\t"
+ "move $27,%1\n\t" /* set k1 */
+ "lw %0,%3\n\t" /* r = *p */
+ "beq %0,%4,2f\n\t" /* if (*p == v) return r */
+ "beql $27,%1,2f\n\t" /* test k1 for change */
+ "sw %4,%2\n\t" /* *p = v; return r */
+ "b 1b\n\t" /* retry */
+ "nop\n\t"
+ ".set pop\n\t"
+ "2:\n"
+ : "=&r" (r), "=&r" (t), "=m" (*p)
+ : "m" (*p), "r" (v)
+ : "memory");
+
+ return r;
+}
+
+static int
+_test_and_set_mips2 (int *p, int v) __THROW
+{
+ int r, t;
+
+ __asm__ __volatile__
+ ("1:\n\t"
+ "ll %0,%3\n\t"
+ ".set push\n\t"
+ ".set noreorder\n\t"
+ "beq %0,%4,2f\n\t"
+ " move %1,%4\n\t"
+ ".set pop\n\t"
+ "sc %1,%2\n\t"
+ "beqz %1,1b\n"
+ "2:\n"
+ : "=&r" (r), "=&r" (t), "=m" (*p)
+ : "m" (*p), "r" (v)
+ : "memory");
+
+ return r;
+}
+
+static int
+_test_and_set_mips1 (int *p, int v) __THROW
+{
+ return sysmips (MIPS_ATOMIC_SET, (int) p, v, 0);
+}
+
+static int
+_mips_test_and_set_init (int *p, int v) __THROW
+{
+ int fp;
+ _mips_test_and_set = _test_and_set_mips1;
+ /* FIXME: write real test */
+ if ((fp =open ("/etc/mips2_cpu_without_llsc", O_RDONLY)) != -1)
+ {
+ close(fp);
+ _mips_test_and_set = _test_and_set_mips2_nollsc;
+ }
+ else if ((fp =open ("/etc/mips2_cpu_with_llsc", O_RDONLY)) != -1)
+ {
+ close(fp);
+ _mips_test_and_set = _test_and_set_mips2;
+ }
+ return _mips_test_and_set (p, v);
+}
+
+int (* _mips_test_and_set) (int *p, int v) __THROW = _mips_test_and_set_init;
diff -uarN glibc-2.2.5.orig/sysdeps/unix/sysv/linux/mips/sys/tas.h glibc-2.2.5/sysdeps/unix/sysv/linux/mips/sys/tas.h
--- glibc-2.2.5.orig/sysdeps/unix/sysv/linux/mips/sys/tas.h Thu Jul 18 00:13:21 2002
+++ glibc-2.2.5/sysdeps/unix/sysv/linux/mips/sys/tas.h Thu Jul 18 00:26:54 2002
@@ -27,6 +27,7 @@
__BEGIN_DECLS
extern int _test_and_set (int *p, int v) __THROW;
+extern int (* _mips_test_and_set) (int *p, int v) __THROW;
#ifdef __USE_EXTERN_INLINES
@@ -34,40 +35,11 @@
# define _EXTERN_INLINE extern __inline
# endif
-# if (_MIPS_ISA >= _MIPS_ISA_MIPS2)
-
-_EXTERN_INLINE int
-_test_and_set (int *p, int v) __THROW
-{
- int r, t;
-
- __asm__ __volatile__
- ("1:\n\t"
- "ll %0,%3\n\t"
- ".set push\n\t"
- ".set noreorder\n\t"
- "beq %0,%4,2f\n\t"
- " move %1,%4\n\t"
- ".set pop\n\t"
- "sc %1,%2\n\t"
- "beqz %1,1b\n"
- "2:\n"
- : "=&r" (r), "=&r" (t), "=m" (*p)
- : "m" (*p), "r" (v)
- : "memory");
-
- return r;
-}
-
-# else /* !(_MIPS_ISA >= _MIPS_ISA_MIPS2) */
-
_EXTERN_INLINE int
_test_and_set (int *p, int v) __THROW
{
- return sysmips (MIPS_ATOMIC_SET, (int) p, v, 0);
+ return _mips_test_and_set (p, v);
}
-
-# endif /* !(_MIPS_ISA >= _MIPS_ISA_MIPS2) */
#endif /* __USE_EXTERN_INLINES */
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: LL/SC benchmarking [was: Mipsel libc with LL/SC online anywhere?]
2002-07-19 12:38 ` LL/SC benchmarking [was: Mipsel libc with LL/SC online anywhere?] Johannes Stezenbach
@ 2002-07-19 15:54 ` Richard Hodges
2002-07-22 10:35 ` Johannes Stezenbach
2002-07-25 16:25 ` Johannes Stezenbach
1 sibling, 1 reply; 12+ messages in thread
From: Richard Hodges @ 2002-07-19 15:54 UTC (permalink / raw)
To: Johannes Stezenbach; +Cc: Kevin D. Kissell, linux-mips
On Fri, 19 Jul 2002, Johannes Stezenbach wrote:
> I'm working on a platform without LL/SC, an embedded system/SOC
> with a NEC VR4120A CPU core. To find out the effect of sysmips
> vs. emulated LL/SC vs. the branch-likely trick posted by
> Kevin D. Kissell <kevink@mips.com> on Tue, 22 Jan 2002 18:16:25 +0100
> I created an experimental patch for glibc-2.2.5 which allows
> run-time switching of the _test_and_set() and __compare_and_swap()
> implementation based on the presence of two "switch files" in /etc/.
...
> I think the beql-hack needs a kernel patch to guarantee k1 !=
> MAGIC_COOKIE after each eret, but for a those few tests I was just
> taking my chance.
Maybe something like this in front of every "eret" instruction?
#ifdef CONFIG_CPU_VR41XX
move $27,$0
#endif
I am also working with an NEC core, and would much prefer to perform
atomic operations in user space. (I understand that this trick is
probably not SMP safe - I don't really care.)
-Richard
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: LL/SC benchmarking [was: Mipsel libc with LL/SC online anywhere?]
2002-07-19 15:54 ` Richard Hodges
@ 2002-07-22 10:35 ` Johannes Stezenbach
0 siblings, 0 replies; 12+ messages in thread
From: Johannes Stezenbach @ 2002-07-22 10:35 UTC (permalink / raw)
To: Richard Hodges; +Cc: Kevin D. Kissell, linux-mips
On Fri, Jul 19, 2002 at 08:54:46AM -0700, Richard Hodges wrote:
> On Fri, 19 Jul 2002, Johannes Stezenbach wrote:
>
> > I think the beql-hack needs a kernel patch to guarantee k1 !=
> > MAGIC_COOKIE after each eret, but for a those few tests I was just
> > taking my chance.
>
> Maybe something like this in front of every "eret" instruction?
>
> #ifdef CONFIG_CPU_VR41XX
> move $27,$0
> #endif
The Sony patch for CPUs without LL/SC and without branch-likely
(posted here on Tue 22 Jan 2002 15:27:44 +0900 by
Machida Hiroyuki <machida@sm.sony.co.jp>) requires to load
a certain magic cookie into k1 before every eret/rfe.
OTOH, Kevin D. Kissel speculates that for the branch-likely
trick it might be possible to find a magic value that already can
never end up in k1 after an eret, as side effect of the
current implementation. So we wouldn't have to patch the
kernel at all.
I for one would be content if I could find a magic cookie value
that lets me avoid adding instructions to the TLB refill handler.
Johannes
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: LL/SC benchmarking [was: Mipsel libc with LL/SC online anywhere?]
2002-07-19 12:38 ` LL/SC benchmarking [was: Mipsel libc with LL/SC online anywhere?] Johannes Stezenbach
2002-07-19 15:54 ` Richard Hodges
@ 2002-07-25 16:25 ` Johannes Stezenbach
2002-07-25 17:06 ` Jun Sun
1 sibling, 1 reply; 12+ messages in thread
From: Johannes Stezenbach @ 2002-07-25 16:25 UTC (permalink / raw)
To: Kevin D. Kissell; +Cc: linux-mips
On Fri, Jul 19, 2002 at 02:38:29PM +0200, Johannes Stezenbach wrote:
> On Fri, Jul 12, 2002 at 03:04:07PM +0200, Kevin D. Kissell wrote:
> > I'm benchmarking some code that does lots of
> > semaphores, and with the libc from the "standard"
> > MIPS/SGI RH 7.1 distribution, those are done using
> > sysmips, in the interest of universality.
>
> I'm working on a platform without LL/SC, an embedded system/SOC
> with a NEC VR4120A CPU core. To find out the effect of sysmips
> vs. emulated LL/SC vs. the branch-likely trick posted by
> Kevin D. Kissell <kevink@mips.com> on Tue, 22 Jan 2002 18:16:25 +0100
> I created an experimental patch for glibc-2.2.5 which allows
> run-time switching of the _test_and_set() and __compare_and_swap()
> implementation based on the presence of two "switch files" in /etc/.
...
> For lack of a better benchmark I used some of the examples from
> glibc-2.2.5/linuxthreads/Examples. The numbers are from the third
> of three successive runs of 'time exN >/dev/null'.
I did some more benchmarking with a test application based on
gtk+-directfb (http://directfb.org/). The benchmark does not
include GUI stuff, but rather reading of lots of external data
into internal data structures (which are GLib-2.0 GObjects).
The test application has three threads, but nearly all processing
is done in the main thread.
I think that the numbers are meaningful for our type of application.
sysmips:
real 1m19.358s
user 0m28.150s
sys 0m47.250s
LL/SC emulation:
real 0m41.246s
user 0m25.390s
sys 0m12.240s
branch-likely hack (hm, still without kernel patch...):
real 0m25.126s
user 0m17.240s
sys 0m2.310s
Regards,
Johannes
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: LL/SC benchmarking [was: Mipsel libc with LL/SC online anywhere?]
2002-07-25 16:25 ` Johannes Stezenbach
@ 2002-07-25 17:06 ` Jun Sun
2002-07-25 18:45 ` Johannes Stezenbach
0 siblings, 1 reply; 12+ messages in thread
From: Jun Sun @ 2002-07-25 17:06 UTC (permalink / raw)
To: Johannes Stezenbach; +Cc: Kevin D. Kissell, linux-mips
Johannes Stezenbach wrote:
> On Fri, Jul 19, 2002 at 02:38:29PM +0200, Johannes Stezenbach wrote:
>
>>On Fri, Jul 12, 2002 at 03:04:07PM +0200, Kevin D. Kissell wrote:
>>
>>>I'm benchmarking some code that does lots of
>>>semaphores, and with the libc from the "standard"
>>>MIPS/SGI RH 7.1 distribution, those are done using
>>>sysmips, in the interest of universality.
>>
>>I'm working on a platform without LL/SC, an embedded system/SOC
>>with a NEC VR4120A CPU core. To find out the effect of sysmips
>>vs. emulated LL/SC vs. the branch-likely trick posted by
>>Kevin D. Kissell <kevink@mips.com> on Tue, 22 Jan 2002 18:16:25 +0100
>>I created an experimental patch for glibc-2.2.5 which allows
>>run-time switching of the _test_and_set() and __compare_and_swap()
>>implementation based on the presence of two "switch files" in /etc/.
>
> ...
>
>>For lack of a better benchmark I used some of the examples from
>>glibc-2.2.5/linuxthreads/Examples. The numbers are from the third
>>of three successive runs of 'time exN >/dev/null'.
>
>
> I did some more benchmarking with a test application based on
> gtk+-directfb (http://directfb.org/). The benchmark does not
> include GUI stuff, but rather reading of lots of external data
> into internal data structures (which are GLib-2.0 GObjects).
> The test application has three threads, but nearly all processing
> is done in the main thread.
>
> I think that the numbers are meaningful for our type of application.
>
> sysmips:
> real 1m19.358s
> user 0m28.150s
> sys 0m47.250s
>
> LL/SC emulation:
> real 0m41.246s
> user 0m25.390s
> sys 0m12.240s
>
> branch-likely hack (hm, still without kernel patch...):
> real 0m25.126s
> user 0m17.240s
> sys 0m2.310s
Johannes,
This is great stuff! Can you explain what are "real", "user", and "sys"?
Also, what is your initial conclusion?
Jun
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: LL/SC benchmarking [was: Mipsel libc with LL/SC online anywhere?]
2002-07-25 17:06 ` Jun Sun
@ 2002-07-25 18:45 ` Johannes Stezenbach
2002-07-25 18:56 ` Jun Sun
2002-07-25 21:49 ` Kevin D. Kissell
0 siblings, 2 replies; 12+ messages in thread
From: Johannes Stezenbach @ 2002-07-25 18:45 UTC (permalink / raw)
To: Jun Sun; +Cc: Kevin D. Kissell, linux-mips
On Thu, Jul 25, 2002 at 10:06:55AM -0700, Jun Sun wrote:
> Johannes Stezenbach wrote:
> >sysmips:
> > real 1m19.358s
> > user 0m28.150s
> > sys 0m47.250s
> >
> >LL/SC emulation:
> > real 0m41.246s
> > user 0m25.390s
> > sys 0m12.240s
> >
> >branch-likely hack (hm, still without kernel patch...):
> > real 0m25.126s
> > user 0m17.240s
> > sys 0m2.310s
>
> Johannes,
>
> This is great stuff! Can you explain what are "real", "user", and "sys"?
> Also, what is your initial conclusion?
This are results from simple 'time ./testapp' testing, so its real time
and user/system time reported by wait(4).
Also, I have an interactive gtk+directfb applicaton running. The
difference in response time is quite noticable.
On reason for the big differences is that the Glib-2.0/GObject library
does a lot of locking in its internal type system for every object
created. Other software might not suffer as badly from a slow mutex
implementation.
My conclusion is that it is good for glibc to always use ll/sc,
emulated or not, and for my specific needs I will use the branch-likely
hack. So next I will study kernel source to decide what MAGIC_COOKIE
is best for the branch-likely hack, and where to add 'move k1,$0'
before eret.
OTOH I doubt it's worth it to add the branch-likely hack to
stock glibc. How many people are using Linux/MIPS on embedded
CPU's without LL/SC?
Johannes
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: LL/SC benchmarking [was: Mipsel libc with LL/SC online anywhere?]
2002-07-25 18:45 ` Johannes Stezenbach
@ 2002-07-25 18:56 ` Jun Sun
2002-07-25 19:24 ` Johannes Stezenbach
2002-07-25 21:49 ` Kevin D. Kissell
1 sibling, 1 reply; 12+ messages in thread
From: Jun Sun @ 2002-07-25 18:56 UTC (permalink / raw)
To: Johannes Stezenbach; +Cc: Kevin D. Kissell, linux-mips
Johannes Stezenbach wrote:
> On Thu, Jul 25, 2002 at 10:06:55AM -0700, Jun Sun wrote:
>
>>Johannes Stezenbach wrote:
>>
>>>sysmips:
>>> real 1m19.358s
>>> user 0m28.150s
>>> sys 0m47.250s
>>>
>>>LL/SC emulation:
>>> real 0m41.246s
>>> user 0m25.390s
>>> sys 0m12.240s
>>>
>>>branch-likely hack (hm, still without kernel patch...):
>>> real 0m25.126s
>>> user 0m17.240s
>>> sys 0m2.310s
>>
>>Johannes,
>>
>>This is great stuff! Can you explain what are "real", "user", and "sys"?
>>Also, what is your initial conclusion?
>
>
> This are results from simple 'time ./testapp' testing, so its real time
> and user/system time reported by wait(4).
>
> Also, I have an interactive gtk+directfb applicaton running. The
> difference in response time is quite noticable.
>
> On reason for the big differences is that the Glib-2.0/GObject library
> does a lot of locking in its internal type system for every object
> created. Other software might not suffer as badly from a slow mutex
> implementation.
>
> My conclusion is that it is good for glibc to always use ll/sc,
> emulated or not, and for my specific needs I will use the branch-likely
> hack. So next I will study kernel source to decide what MAGIC_COOKIE
> is best for the branch-likely hack, and where to add 'move k1,$0'
> before eret.
>
> OTOH I doubt it's worth it to add the branch-likely hack to
> stock glibc. How many people are using Linux/MIPS on embedded
> CPU's without LL/SC?
>
There are probably more than you think. The popuplar (and notorious) NEC
VR41xx family fall into this category. I think at least one or two other
families of CPUs are like this too.
Jun
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: LL/SC benchmarking [was: Mipsel libc with LL/SC online anywhere?]
2002-07-25 18:56 ` Jun Sun
@ 2002-07-25 19:24 ` Johannes Stezenbach
0 siblings, 0 replies; 12+ messages in thread
From: Johannes Stezenbach @ 2002-07-25 19:24 UTC (permalink / raw)
To: Jun Sun; +Cc: Kevin D. Kissell, linux-mips
On Thu, Jul 25, 2002 at 11:56:23AM -0700, Jun Sun wrote:
> Johannes Stezenbach wrote:
> >OTOH I doubt it's worth it to add the branch-likely hack to
> >stock glibc. How many people are using Linux/MIPS on embedded
> >CPU's without LL/SC?
>
> There are probably more than you think. The popuplar (and notorious) NEC
> VR41xx family fall into this category. I think at least one or two other
> families of CPUs are like this too.
Ok, then maybe we should have /proc/sys/ entries where the kernel
tells glibc about CPU capabilities and kernel support for
userpace atomic operations, like:
- no /proc/sys/mips/* : use ll/sc (maybe emulated)
- have /proc/sys/mips/mips2-without-llsc: use branch-likely, read
to get MAGIC_COOKIE to use
- have /proc/sys/mips/sony-ps2: whatever
- ...
Or use a sysmips() call to get the information.
Johannes
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: LL/SC benchmarking [was: Mipsel libc with LL/SC online anywhere?]
2002-07-25 18:45 ` Johannes Stezenbach
2002-07-25 18:56 ` Jun Sun
@ 2002-07-25 21:49 ` Kevin D. Kissell
2002-07-26 19:35 ` Kevin D. Kissell
1 sibling, 1 reply; 12+ messages in thread
From: Kevin D. Kissell @ 2002-07-25 21:49 UTC (permalink / raw)
To: Johannes Stezenbach; +Cc: Jun Sun, linux-mips
Johannes Stezenbach wrote:
>
> On Thu, Jul 25, 2002 at 10:06:55AM -0700, Jun Sun wrote:
> > Johannes Stezenbach wrote:
> > >sysmips:
> > > real 1m19.358s
> > > user 0m28.150s
> > > sys 0m47.250s
> > >
> > >LL/SC emulation:
> > > real 0m41.246s
> > > user 0m25.390s
> > > sys 0m12.240s
> > >
> > >branch-likely hack (hm, still without kernel patch...):
> > > real 0m25.126s
> > > user 0m17.240s
> > > sys 0m2.310s
> >
> > Johannes,
> >
> > This is great stuff! Can you explain what are "real", "user", and "sys"?
> > Also, what is your initial conclusion?
>
> This are results from simple 'time ./testapp' testing, so its real time
> and user/system time reported by wait(4).
>
> Also, I have an interactive gtk+directfb applicaton running. The
> difference in response time is quite noticable.
>
> On reason for the big differences is that the Glib-2.0/GObject library
> does a lot of locking in its internal type system for every object
> created. Other software might not suffer as badly from a slow mutex
> implementation.
>
> My conclusion is that it is good for glibc to always use ll/sc,
> emulated or not, and for my specific needs I will use the branch-likely
> hack. So next I will study kernel source to decide what MAGIC_COOKIE
> is best for the branch-likely hack, and where to add 'move k1,$0'
> before eret.
I am convinced that there is a value, quite possibly 0xffdadaff,
which can provably never be in k1 at the return from an exception
in a sane system - but it would be tedious to prove, and the
assumption could very easily be perturbed. I think
that adding overhead to the TLB refill handler would be
highly undesirable, but fortunately the TLB refill handler
is one of those cases where we can be sure that members
of a set of values (including 0xffdadaff) could not be
in k1 unless the system was about to crash in any case.
The prudent thing to do would be to load the MAGIC_COOKIE
value explicitly into k1 on the way out of general exception
service. Fortunately, it looks to me as if at least half
of the overhead of this operation (LUI/ORI) can be concealed
in branch/jump delay slots that are currently going unfilled.
> OTOH I doubt it's worth it to add the branch-likely hack to
> stock glibc. How many people are using Linux/MIPS on embedded
> CPU's without LL/SC?
MIPSII-but-no-LL/SC CPUs are certainly the minority as far
as distinct designs and part numbers are concerned, but
I suspect they provide the overwhelming majority of actual
MIPS/Linux platforms in use, since both NEC Vr41xx-based
handhelds and Sony PlayStation 2's fall into that category.
Someone else may have better statistics than I do, though.
Regards,
Kevin K.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: LL/SC benchmarking [was: Mipsel libc with LL/SC online anywhere?]
2002-07-25 21:49 ` Kevin D. Kissell
@ 2002-07-26 19:35 ` Kevin D. Kissell
0 siblings, 0 replies; 12+ messages in thread
From: Kevin D. Kissell @ 2002-07-26 19:35 UTC (permalink / raw)
To: Johannes Stezenbach, Jun Sun, linux-mips
"Kevin D. Kissell" wrote:
> The prudent thing to do would be to load the MAGIC_COOKIE
> value explicitly into k1 on the way out of general exception
> service. Fortunately, it looks to me as if at least half
> of the overhead of this operation (LUI/ORI) can be concealed
> in branch/jump delay slots that are currently going unfilled.
I was distracted in the middle of that reply and got confused.
The MAGIC_COOKIE needs only to be destroyed, which can be
done in a single instruction. The 100% safe approach would
be to insert a "move k1,zero" instruction before all ERETs,
including those generated by the RESTORE_ALL_AND_RET macro
expansion, but it should faster, if very slightly larger
and somewhat more burdensome for maintenence, to plant those
instructions in branch delay slots just "upstream" from the
context restore. I'm working on the code a bit and may
be able to propose (if not test :-) a patch along these
lines, but in looking at entry.S, I note something a bit
disturbing:
There's a lot of code in there that allows the assembler
to schedule the instructions, but which also contains
SSNOPs to force timing. Isn't that a bit dangerous?
Unless it is specified that the assembler will refuse
to reschedule SSNOPs, I think those sequences need to
be bracketed with .noreorder directives. That would
also allow k1 destructors to be placed explicitly in
the delay slots, rather than assuming that they will
be put there by the assembler.
Regards,
Kevin K.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2002-07-26 19:34 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-07-12 13:04 Mipsel libc with LL/SC online anywhere? Kevin D. Kissell
2002-07-12 13:04 ` Kevin D. Kissell
2002-07-19 12:38 ` LL/SC benchmarking [was: Mipsel libc with LL/SC online anywhere?] Johannes Stezenbach
2002-07-19 15:54 ` Richard Hodges
2002-07-22 10:35 ` Johannes Stezenbach
2002-07-25 16:25 ` Johannes Stezenbach
2002-07-25 17:06 ` Jun Sun
2002-07-25 18:45 ` Johannes Stezenbach
2002-07-25 18:56 ` Jun Sun
2002-07-25 19:24 ` Johannes Stezenbach
2002-07-25 21:49 ` Kevin D. Kissell
2002-07-26 19:35 ` Kevin D. Kissell
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.