All of lore.kernel.org
 help / color / mirror / Atom feed
* Mipsel libc with LL/SC online anywhere?
@ 2002-07-12 13:04 ` Kevin D. Kissell
  0 siblings, 0 replies; 12+ messages in thread
From: Kevin D. Kissell @ 2002-07-12 13:04 UTC (permalink / raw)
  To: linux-mips

I'm benchmarking some code that does lots of
semaphores, and with the libc from the "standard"
MIPS/SGI RH 7.1 distribution, those are done using
sysmips, in the interest of universality.  Regardles of
whether and how the ongoing argument of How Things
Should Be is settled, is there a copy of an up-to-date
glibc package built to use ll/sc out there on anyone's
FTP or web server?  I suppose I could extract and
replace the necessary routines by hand, but that would
be slow and fraught with the risk of error...

            Regards,

            Kevin K.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Mipsel libc with LL/SC online anywhere?
@ 2002-07-12 13:04 ` Kevin D. Kissell
  0 siblings, 0 replies; 12+ messages in thread
From: Kevin D. Kissell @ 2002-07-12 13:04 UTC (permalink / raw)
  To: linux-mips

I'm benchmarking some code that does lots of
semaphores, and with the libc from the "standard"
MIPS/SGI RH 7.1 distribution, those are done using
sysmips, in the interest of universality.  Regardles of
whether and how the ongoing argument of How Things
Should Be is settled, is there a copy of an up-to-date
glibc package built to use ll/sc out there on anyone's
FTP or web server?  I suppose I could extract and
replace the necessary routines by hand, but that would
be slow and fraught with the risk of error...

            Regards,

            Kevin K.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* LL/SC benchmarking [was: Mipsel libc with LL/SC online anywhere?]
  2002-07-12 13:04 ` Kevin D. Kissell
  (?)
@ 2002-07-19 12:38 ` Johannes Stezenbach
  2002-07-19 15:54   ` Richard Hodges
  2002-07-25 16:25   ` Johannes Stezenbach
  -1 siblings, 2 replies; 12+ messages in thread
From: Johannes Stezenbach @ 2002-07-19 12:38 UTC (permalink / raw)
  To: Kevin D. Kissell; +Cc: linux-mips

On Fri, Jul 12, 2002 at 03:04:07PM +0200, Kevin D. Kissell wrote:
> I'm benchmarking some code that does lots of
> semaphores, and with the libc from the "standard"
> MIPS/SGI RH 7.1 distribution, those are done using
> sysmips, in the interest of universality.

I'm working on a platform without LL/SC, an embedded system/SOC
with a NEC VR4120A CPU core. To find out the effect of sysmips
vs. emulated LL/SC vs. the branch-likely trick posted by
Kevin D. Kissell <kevink@mips.com> on Tue, 22 Jan 2002 18:16:25 +0100
I created an experimental patch for glibc-2.2.5 which allows
run-time switching of the _test_and_set() and __compare_and_swap()
implementation based on the presence of two "switch files" in /etc/.

Despite its ugliness, I include the patch below for those interested.
(Note: I built my glibc with -mips2, so the patch lacks .set mips2
directives.)

One thing that caused me some headaches was that the __compare_and_swap()
implementation in glibc-2.2.5 is broken (but fixed in glibc CVS and H.J.Lu's
patch).

For lack of a better benchmark I used some of the examples from
glibc-2.2.5/linuxthreads/Examples. The numbers are from the third
of three successive runs of 'time exN >/dev/null'.

sysmips:
  ex1	real    0m0.273s 	user    0m0.040s 	sys     0m0.230s
  ex2	real    0m10.911s 	user    0m2.730s 	sys     0m8.180s
  ex3	real    0m3.648s 	user    0m3.400s 	sys     0m0.250s
  ex5	real    0m4.539s 	user    0m1.830s 	sys     0m2.710s

ll/sc emulation:
  ex1	real    0m0.272s 	user    0m0.020s 	sys     0m0.250s
  ex2	real    0m4.726s 	user    0m1.660s 	sys     0m3.060s
  ex3	real    0m3.968s 	user    0m3.750s 	sys     0m0.220s
  ex5	real    0m4.069s 	user    0m1.710s 	sys     0m2.360s

beql-hack:
  ex1	real    0m0.268s 	user    0m0.010s 	sys     0m0.260s
  ex2	real    0m3.988s 	user    0m1.620s 	sys     0m2.360s
  ex3	real    0m3.965s 	user    0m3.740s 	sys     0m0.220s
  ex5	real    0m2.606s 	user    0m1.000s 	sys     0m1.600s

I think the poor performance of sysmips is caused by the absence of
__compare_and_swap(), which forces libpthread to use less efficient
implementations for semaphore and lock functions.

Running each of the four tests three times yields around one million
LL/SC emulations in /proc/cpuinfo.

I think the beql-hack needs a kernel patch to guarantee k1 !=
MAGIC_COOKIE after each eret, but for a those few tests I was just
taking my chance.

Next, I'm trying to run the pthread tests from LTP. If someone
has a better benchmark code for pthread performance, I'm interested.


Regards,
Johannes



diff -uarN glibc-2.2.5.orig/linuxthreads/sysdeps/mips/pspinlock.c glibc-2.2.5/linuxthreads/sysdeps/mips/pspinlock.c
--- glibc-2.2.5.orig/linuxthreads/sysdeps/mips/pspinlock.c	Thu Jul 18 14:28:07 2002
+++ glibc-2.2.5/linuxthreads/sysdeps/mips/pspinlock.c	Thu Jul 18 18:35:46 2002
@@ -23,7 +23,93 @@
 #include <sys/tas.h>
 #include "internals.h"
 
-#if (_MIPS_ISA >= _MIPS_ISA_MIPS2)
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <unistd.h>
+
+static int
+_compare_and_swap_mips2 (long int *p, long int oldval, long int newval)
+{
+  long int ret;
+
+  __asm__ __volatile__
+    ("/* Inline compare & swap */\n\t"
+     "1:\n\t"
+     "ll	%0,%4\n\t"
+     ".set	push\n"
+     ".set	noreorder\n\t"
+     "bne	%0,%2,2f\n\t"
+     "move	%0,$0\n\t"
+     "move	%0,%3\n\t"
+     ".set	pop\n\t"
+     "sc	%0,%1\n\t"
+     "beqz	%0,1b\n"
+     "2:\n\t"
+     "/* End compare & swap */"
+     : "=&r" (ret), "=m" (*p)
+     : "r" (oldval), "r" (newval), "m" (*p)
+     : "memory");
+
+  return ret;
+}
+
+static int
+_compare_and_swap_mips2_nollsc (long int *p, long int oldval, long int newval)
+{
+  long int r, t;
+
+  __asm__ __volatile__
+    (".set	push\n\t"
+     ".set	noreorder\n\t"
+     "li	%1,0xffaaffaa\n\t"	/* MAGIC_COOKIE */
+     "1:\n\t"
+     "move	$27,%1\n\t"		/* set k1 */
+     "lw	%0,%5\n\t"		/* r = *p */
+     "bne	%0,%3,3f\n\t"		/* if (r != oldval) return 0 */
+     "move	%0,$0\n\t"		/* r = 0 */
+     "move	%0,%4\n\t"		/* r = newval */
+     "beql	$27,%1,2f\n\t"		/* test k1 for change */
+     "sw	%0,%2\n\t"		/* *p = r; return 1 */
+     "b		1b\n\t"			/* k1 changed, retry */
+     "nop\n\t"
+     ".set	pop\n\t"
+     "2:\n"
+     "li	%0,1\n\t"		/* r = 1 */
+     "3:\n"
+     : "=&r" (r), "=&r" (t), "=m" (*p)
+     : "r" (oldval), "r" (newval), "m" (*p)
+     : "memory");
+
+  return r;
+}
+
+int
+compare_and_swap_is_available (void)
+{
+  int fp;
+  /* FIXME: write real test */
+  if ((fp =open ("/etc/mips2_cpu_without_llsc", O_RDONLY)) != -1)
+    {
+      close(fp);
+      _mips_compare_and_swap = _compare_and_swap_mips2_nollsc;
+      return 1;
+    }
+  if ((fp =open ("/etc/mips2_cpu_with_llsc", O_RDONLY)) != -1)
+    {
+      close(fp);
+      _mips_compare_and_swap = _compare_and_swap_mips2;
+      return 1;
+    }
+  return 0;
+}
+
+int (* _mips_compare_and_swap) (long int *p, long int oldval, long int newval)
+  = NULL;
+
+
+#if 0 && (_MIPS_ISA >= _MIPS_ISA_MIPS2)
+  /* don't nother, no one uses this... */
 
 /* This implementation is similar to the one used in the Linux kernel.  */
 int
diff -uarN glibc-2.2.5.orig/linuxthreads/sysdeps/mips/pt-machine.h glibc-2.2.5/linuxthreads/sysdeps/mips/pt-machine.h
--- glibc-2.2.5.orig/linuxthreads/sysdeps/mips/pt-machine.h	Thu Jul 18 14:28:13 2002
+++ glibc-2.2.5/linuxthreads/sysdeps/mips/pt-machine.h	Thu Jul 18 16:27:15 2002
@@ -33,41 +33,11 @@
 
 /* Spinlock implementation; required.  */
 
-#if (_MIPS_ISA >= _MIPS_ISA_MIPS2)
-
-PT_EI long int
-testandset (int *spinlock)
-{
-  long int ret, temp;
-
-  __asm__ __volatile__
-    ("/* Inline spinlock test & set */\n\t"
-     "1:\n\t"
-     "ll	%0,%3\n\t"
-     ".set	push\n\t"
-     ".set	noreorder\n\t"
-     "bnez	%0,2f\n\t"
-     " li	%1,1\n\t"
-     ".set	pop\n\t"
-     "sc	%1,%2\n\t"
-     "beqz	%1,1b\n"
-     "2:\n\t"
-     "/* End spinlock test & set */"
-     : "=&r" (ret), "=&r" (temp), "=m" (*spinlock)
-     : "m" (*spinlock)
-     : "memory");
-
-  return ret;
-}
-
-#else /* !(_MIPS_ISA >= _MIPS_ISA_MIPS2) */
-
 PT_EI long int
 testandset (int *spinlock)
 {
   return _test_and_set (spinlock, 1);
 }
-#endif /* !(_MIPS_ISA >= _MIPS_ISA_MIPS2) */
 
 
 /* Get some notion of the current stack.  Need not be exactly the top
@@ -78,32 +48,13 @@
 
 /* Compare-and-swap for semaphores. */
 
-#if (_MIPS_ISA >= _MIPS_ISA_MIPS2)
-
 #define HAS_COMPARE_AND_SWAP
+#define TEST_FOR_COMPARE_AND_SWAP
+extern int (* _mips_compare_and_swap) (long int *p, long int oldval, long int newval);
+extern int compare_and_swap_is_available (void);
+
 PT_EI int
 __compare_and_swap (long int *p, long int oldval, long int newval)
 {
-  long int ret;
-
-  __asm__ __volatile__
-    ("/* Inline compare & swap */\n\t"
-     "1:\n\t"
-     "ll	%0,%4\n\t"
-     ".set	push\n"
-     ".set	noreorder\n\t"
-     "bne	%0,%2,2f\n\t"
-     " move	%0,%3\n\t"
-     ".set	pop\n\t"
-     "sc	%0,%1\n\t"
-     "beqz	%0,1b\n"
-     "2:\n\t"
-     "/* End compare & swap */"
-     : "=&r" (ret), "=m" (*p)
-     : "r" (oldval), "r" (newval), "m" (*p)
-     : "memory");
-
-  return ret;
+  return _mips_compare_and_swap (p, oldval, newval);
 }
-
-#endif /* (_MIPS_ISA >= _MIPS_ISA_MIPS2) */
diff -uarN glibc-2.2.5.orig/sysdeps/unix/sysv/linux/mips/_test_and_set.c glibc-2.2.5/sysdeps/unix/sysv/linux/mips/_test_and_set.c
--- glibc-2.2.5.orig/sysdeps/unix/sysv/linux/mips/_test_and_set.c	Thu Jul 18 00:21:15 2002
+++ glibc-2.2.5/sysdeps/unix/sysv/linux/mips/_test_and_set.c	Thu Jul 18 14:39:01 2002
@@ -21,6 +21,12 @@
    defined in sys/tas.h  */
 
 #include <features.h>
+#include <sgidefs.h>
+#include <sys/sysmips.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <unistd.h>
 
 #define _EXTERN_INLINE
 #ifndef __USE_EXTERN_INLINES
@@ -28,3 +34,80 @@
 #endif
 
 #include "sys/tas.h"
+
+
+static int
+_test_and_set_mips2_nollsc (int *p, int v) __THROW
+{
+  int r, t;
+
+  __asm__ __volatile__
+    (".set	push\n\t"
+     ".set	noreorder\n\t"
+     "li	%1,0xffaaffaa\n\t"	/* MAGIC_COOKIE */
+     "1:\n\t"
+     "move	$27,%1\n\t"		/* set k1 */
+     "lw	%0,%3\n\t"		/* r = *p */
+     "beq	%0,%4,2f\n\t"		/* if (*p == v) return r */
+     "beql	$27,%1,2f\n\t"		/* test k1 for change */
+     "sw	%4,%2\n\t"		/* *p = v; return r */
+     "b		1b\n\t"			/* retry */
+     "nop\n\t"
+     ".set	pop\n\t"
+     "2:\n"
+     : "=&r" (r), "=&r" (t), "=m" (*p)
+     : "m" (*p), "r" (v)
+     : "memory");
+
+  return r;
+}
+
+static int
+_test_and_set_mips2 (int *p, int v) __THROW
+{
+  int r, t;
+
+  __asm__ __volatile__
+    ("1:\n\t"
+     "ll	%0,%3\n\t"
+     ".set	push\n\t"
+     ".set	noreorder\n\t"
+     "beq	%0,%4,2f\n\t"
+     " move	%1,%4\n\t"
+     ".set	pop\n\t"
+     "sc	%1,%2\n\t"
+     "beqz	%1,1b\n"
+     "2:\n"
+     : "=&r" (r), "=&r" (t), "=m" (*p)
+     : "m" (*p), "r" (v)
+     : "memory");
+
+  return r;
+}
+
+static int
+_test_and_set_mips1 (int *p, int v) __THROW
+{
+  return sysmips (MIPS_ATOMIC_SET, (int) p, v, 0);
+}
+
+static int
+_mips_test_and_set_init (int *p, int v) __THROW
+{
+  int fp;
+  _mips_test_and_set = _test_and_set_mips1;
+  /* FIXME: write real test */
+  if ((fp =open ("/etc/mips2_cpu_without_llsc", O_RDONLY)) != -1)
+    {
+      close(fp);
+      _mips_test_and_set = _test_and_set_mips2_nollsc;
+    }
+  else if ((fp =open ("/etc/mips2_cpu_with_llsc", O_RDONLY)) != -1)
+    {
+      close(fp);
+      _mips_test_and_set = _test_and_set_mips2;
+    }
+  return _mips_test_and_set (p, v);
+}
+
+int (* _mips_test_and_set) (int *p, int v) __THROW = _mips_test_and_set_init;
diff -uarN glibc-2.2.5.orig/sysdeps/unix/sysv/linux/mips/sys/tas.h glibc-2.2.5/sysdeps/unix/sysv/linux/mips/sys/tas.h
--- glibc-2.2.5.orig/sysdeps/unix/sysv/linux/mips/sys/tas.h	Thu Jul 18 00:13:21 2002
+++ glibc-2.2.5/sysdeps/unix/sysv/linux/mips/sys/tas.h	Thu Jul 18 00:26:54 2002
@@ -27,6 +27,7 @@
 __BEGIN_DECLS
 
 extern int _test_and_set (int *p, int v) __THROW;
+extern int (* _mips_test_and_set) (int *p, int v) __THROW;
 
 #ifdef __USE_EXTERN_INLINES
 
@@ -34,40 +35,11 @@
 #  define _EXTERN_INLINE extern __inline
 # endif
 
-# if (_MIPS_ISA >= _MIPS_ISA_MIPS2)
-
-_EXTERN_INLINE int
-_test_and_set (int *p, int v) __THROW
-{
-  int r, t;
-
-  __asm__ __volatile__
-    ("1:\n\t"
-     "ll	%0,%3\n\t"
-     ".set	push\n\t"
-     ".set	noreorder\n\t"
-     "beq	%0,%4,2f\n\t"
-     " move	%1,%4\n\t"
-     ".set	pop\n\t"
-     "sc	%1,%2\n\t"
-     "beqz	%1,1b\n"
-     "2:\n"
-     : "=&r" (r), "=&r" (t), "=m" (*p)
-     : "m" (*p), "r" (v)
-     : "memory");
-
-  return r;
-}
-
-# else /* !(_MIPS_ISA >= _MIPS_ISA_MIPS2) */
-
 _EXTERN_INLINE int
 _test_and_set (int *p, int v) __THROW
 {
-  return sysmips (MIPS_ATOMIC_SET, (int) p, v, 0);
+  return _mips_test_and_set (p, v);
 }
-
-# endif /* !(_MIPS_ISA >= _MIPS_ISA_MIPS2) */
 
 #endif /* __USE_EXTERN_INLINES */
 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: LL/SC benchmarking [was: Mipsel libc with LL/SC online anywhere?]
  2002-07-19 12:38 ` LL/SC benchmarking [was: Mipsel libc with LL/SC online anywhere?] Johannes Stezenbach
@ 2002-07-19 15:54   ` Richard Hodges
  2002-07-22 10:35     ` Johannes Stezenbach
  2002-07-25 16:25   ` Johannes Stezenbach
  1 sibling, 1 reply; 12+ messages in thread
From: Richard Hodges @ 2002-07-19 15:54 UTC (permalink / raw)
  To: Johannes Stezenbach; +Cc: Kevin D. Kissell, linux-mips

On Fri, 19 Jul 2002, Johannes Stezenbach wrote:

> I'm working on a platform without LL/SC, an embedded system/SOC
> with a NEC VR4120A CPU core. To find out the effect of sysmips
> vs. emulated LL/SC vs. the branch-likely trick posted by
> Kevin D. Kissell <kevink@mips.com> on Tue, 22 Jan 2002 18:16:25 +0100
> I created an experimental patch for glibc-2.2.5 which allows
> run-time switching of the _test_and_set() and __compare_and_swap()
> implementation based on the presence of two "switch files" in /etc/.

...

> I think the beql-hack needs a kernel patch to guarantee k1 !=
> MAGIC_COOKIE after each eret, but for a those few tests I was just
> taking my chance.

Maybe something like this in front of every "eret" instruction?

#ifdef CONFIG_CPU_VR41XX
	move	$27,$0
#endif

I am also working with an NEC core, and would much prefer to perform
atomic operations in user space.  (I understand that this trick is
probably not SMP safe - I don't really care.)

-Richard

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: LL/SC benchmarking [was: Mipsel libc with LL/SC online anywhere?]
  2002-07-19 15:54   ` Richard Hodges
@ 2002-07-22 10:35     ` Johannes Stezenbach
  0 siblings, 0 replies; 12+ messages in thread
From: Johannes Stezenbach @ 2002-07-22 10:35 UTC (permalink / raw)
  To: Richard Hodges; +Cc: Kevin D. Kissell, linux-mips

On Fri, Jul 19, 2002 at 08:54:46AM -0700, Richard Hodges wrote:
> On Fri, 19 Jul 2002, Johannes Stezenbach wrote:
> 
> > I think the beql-hack needs a kernel patch to guarantee k1 !=
> > MAGIC_COOKIE after each eret, but for a those few tests I was just
> > taking my chance.
> 
> Maybe something like this in front of every "eret" instruction?
> 
> #ifdef CONFIG_CPU_VR41XX
> 	move	$27,$0
> #endif

The Sony patch for CPUs without LL/SC and without branch-likely
(posted here on Tue 22 Jan 2002 15:27:44 +0900 by
Machida Hiroyuki <machida@sm.sony.co.jp>) requires to load
a certain magic cookie into k1 before every eret/rfe.

OTOH, Kevin D. Kissel speculates that for the branch-likely
trick it might be possible to find a magic value that already can
never end up in k1 after an eret, as side effect of the
current implementation. So we wouldn't have to patch the
kernel at all.

I for one would be content if I could find a magic cookie value
that lets me avoid adding instructions to the TLB refill handler.


Johannes

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: LL/SC benchmarking [was: Mipsel libc with LL/SC online anywhere?]
  2002-07-19 12:38 ` LL/SC benchmarking [was: Mipsel libc with LL/SC online anywhere?] Johannes Stezenbach
  2002-07-19 15:54   ` Richard Hodges
@ 2002-07-25 16:25   ` Johannes Stezenbach
  2002-07-25 17:06     ` Jun Sun
  1 sibling, 1 reply; 12+ messages in thread
From: Johannes Stezenbach @ 2002-07-25 16:25 UTC (permalink / raw)
  To: Kevin D. Kissell; +Cc: linux-mips

On Fri, Jul 19, 2002 at 02:38:29PM +0200, Johannes Stezenbach wrote:
> On Fri, Jul 12, 2002 at 03:04:07PM +0200, Kevin D. Kissell wrote:
> > I'm benchmarking some code that does lots of
> > semaphores, and with the libc from the "standard"
> > MIPS/SGI RH 7.1 distribution, those are done using
> > sysmips, in the interest of universality.
> 
> I'm working on a platform without LL/SC, an embedded system/SOC
> with a NEC VR4120A CPU core. To find out the effect of sysmips
> vs. emulated LL/SC vs. the branch-likely trick posted by
> Kevin D. Kissell <kevink@mips.com> on Tue, 22 Jan 2002 18:16:25 +0100
> I created an experimental patch for glibc-2.2.5 which allows
> run-time switching of the _test_and_set() and __compare_and_swap()
> implementation based on the presence of two "switch files" in /etc/.
... 
> For lack of a better benchmark I used some of the examples from
> glibc-2.2.5/linuxthreads/Examples. The numbers are from the third
> of three successive runs of 'time exN >/dev/null'.

I did some more benchmarking with a test application based on
gtk+-directfb (http://directfb.org/). The benchmark does not
include GUI stuff, but rather reading of lots of external data
into internal data structures (which are GLib-2.0 GObjects).
The test application has three threads, but nearly all processing
is done in the main thread.

I think that the numbers are meaningful for our type of application.

sysmips:
        real    1m19.358s
        user    0m28.150s
        sys     0m47.250s

LL/SC emulation:
        real    0m41.246s
        user    0m25.390s
        sys     0m12.240s

branch-likely hack (hm, still without kernel patch...):
        real    0m25.126s
        user    0m17.240s
        sys     0m2.310s


Regards,
Johannes

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: LL/SC benchmarking [was: Mipsel libc with LL/SC online anywhere?]
  2002-07-25 16:25   ` Johannes Stezenbach
@ 2002-07-25 17:06     ` Jun Sun
  2002-07-25 18:45       ` Johannes Stezenbach
  0 siblings, 1 reply; 12+ messages in thread
From: Jun Sun @ 2002-07-25 17:06 UTC (permalink / raw)
  To: Johannes Stezenbach; +Cc: Kevin D. Kissell, linux-mips

Johannes Stezenbach wrote:
> On Fri, Jul 19, 2002 at 02:38:29PM +0200, Johannes Stezenbach wrote:
> 
>>On Fri, Jul 12, 2002 at 03:04:07PM +0200, Kevin D. Kissell wrote:
>>
>>>I'm benchmarking some code that does lots of
>>>semaphores, and with the libc from the "standard"
>>>MIPS/SGI RH 7.1 distribution, those are done using
>>>sysmips, in the interest of universality.
>>
>>I'm working on a platform without LL/SC, an embedded system/SOC
>>with a NEC VR4120A CPU core. To find out the effect of sysmips
>>vs. emulated LL/SC vs. the branch-likely trick posted by
>>Kevin D. Kissell <kevink@mips.com> on Tue, 22 Jan 2002 18:16:25 +0100
>>I created an experimental patch for glibc-2.2.5 which allows
>>run-time switching of the _test_and_set() and __compare_and_swap()
>>implementation based on the presence of two "switch files" in /etc/.
> 
> ... 
> 
>>For lack of a better benchmark I used some of the examples from
>>glibc-2.2.5/linuxthreads/Examples. The numbers are from the third
>>of three successive runs of 'time exN >/dev/null'.
> 
> 
> I did some more benchmarking with a test application based on
> gtk+-directfb (http://directfb.org/). The benchmark does not
> include GUI stuff, but rather reading of lots of external data
> into internal data structures (which are GLib-2.0 GObjects).
> The test application has three threads, but nearly all processing
> is done in the main thread.
> 
> I think that the numbers are meaningful for our type of application.
> 
> sysmips:
>         real    1m19.358s
>         user    0m28.150s
>         sys     0m47.250s
> 
> LL/SC emulation:
>         real    0m41.246s
>         user    0m25.390s
>         sys     0m12.240s
> 
> branch-likely hack (hm, still without kernel patch...):
>         real    0m25.126s
>         user    0m17.240s
>         sys     0m2.310s

Johannes,

This is great stuff!  Can you explain what are "real", "user", and "sys"? 
Also, what is your initial conclusion?

Jun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: LL/SC benchmarking [was: Mipsel libc with LL/SC online anywhere?]
  2002-07-25 17:06     ` Jun Sun
@ 2002-07-25 18:45       ` Johannes Stezenbach
  2002-07-25 18:56         ` Jun Sun
  2002-07-25 21:49         ` Kevin D. Kissell
  0 siblings, 2 replies; 12+ messages in thread
From: Johannes Stezenbach @ 2002-07-25 18:45 UTC (permalink / raw)
  To: Jun Sun; +Cc: Kevin D. Kissell, linux-mips

On Thu, Jul 25, 2002 at 10:06:55AM -0700, Jun Sun wrote:
> Johannes Stezenbach wrote:
> >sysmips:
> >        real    1m19.358s
> >        user    0m28.150s
> >        sys     0m47.250s
> >
> >LL/SC emulation:
> >        real    0m41.246s
> >        user    0m25.390s
> >        sys     0m12.240s
> >
> >branch-likely hack (hm, still without kernel patch...):
> >        real    0m25.126s
> >        user    0m17.240s
> >        sys     0m2.310s
> 
> Johannes,
> 
> This is great stuff!  Can you explain what are "real", "user", and "sys"? 
> Also, what is your initial conclusion?

This are results from simple 'time ./testapp' testing, so its real time
and user/system time reported by wait(4).

Also, I have an interactive gtk+directfb applicaton running. The
difference in response time is quite noticable.

On reason for the big differences is that the Glib-2.0/GObject library
does a lot of locking in its internal type system for every object
created. Other software might not suffer as badly from a slow mutex
implementation.

My conclusion is that it is good for glibc to always use ll/sc,
emulated or not, and for my specific needs I will use the branch-likely
hack. So next I will study kernel source to decide what MAGIC_COOKIE
is best for the branch-likely hack, and where to add 'move k1,$0'
before eret.

OTOH I doubt it's worth it to add the branch-likely hack to
stock glibc. How many people are using Linux/MIPS on embedded
CPU's without LL/SC?


Johannes

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: LL/SC benchmarking [was: Mipsel libc with LL/SC online anywhere?]
  2002-07-25 18:45       ` Johannes Stezenbach
@ 2002-07-25 18:56         ` Jun Sun
  2002-07-25 19:24           ` Johannes Stezenbach
  2002-07-25 21:49         ` Kevin D. Kissell
  1 sibling, 1 reply; 12+ messages in thread
From: Jun Sun @ 2002-07-25 18:56 UTC (permalink / raw)
  To: Johannes Stezenbach; +Cc: Kevin D. Kissell, linux-mips

Johannes Stezenbach wrote:
> On Thu, Jul 25, 2002 at 10:06:55AM -0700, Jun Sun wrote:
> 
>>Johannes Stezenbach wrote:
>>
>>>sysmips:
>>>       real    1m19.358s
>>>       user    0m28.150s
>>>       sys     0m47.250s
>>>
>>>LL/SC emulation:
>>>       real    0m41.246s
>>>       user    0m25.390s
>>>       sys     0m12.240s
>>>
>>>branch-likely hack (hm, still without kernel patch...):
>>>       real    0m25.126s
>>>       user    0m17.240s
>>>       sys     0m2.310s
>>
>>Johannes,
>>
>>This is great stuff!  Can you explain what are "real", "user", and "sys"? 
>>Also, what is your initial conclusion?
> 
> 
> This are results from simple 'time ./testapp' testing, so its real time
> and user/system time reported by wait(4).
> 
> Also, I have an interactive gtk+directfb applicaton running. The
> difference in response time is quite noticable.
> 
> On reason for the big differences is that the Glib-2.0/GObject library
> does a lot of locking in its internal type system for every object
> created. Other software might not suffer as badly from a slow mutex
> implementation.
> 
> My conclusion is that it is good for glibc to always use ll/sc,
> emulated or not, and for my specific needs I will use the branch-likely
> hack. So next I will study kernel source to decide what MAGIC_COOKIE
> is best for the branch-likely hack, and where to add 'move k1,$0'
> before eret.
> 
> OTOH I doubt it's worth it to add the branch-likely hack to
> stock glibc. How many people are using Linux/MIPS on embedded
> CPU's without LL/SC?
> 

There are probably more than you think.  The popuplar (and notorious) NEC 
VR41xx family fall into this category.  I think at least one or two other 
families of CPUs are like this too.

Jun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: LL/SC benchmarking [was: Mipsel libc with LL/SC online anywhere?]
  2002-07-25 18:56         ` Jun Sun
@ 2002-07-25 19:24           ` Johannes Stezenbach
  0 siblings, 0 replies; 12+ messages in thread
From: Johannes Stezenbach @ 2002-07-25 19:24 UTC (permalink / raw)
  To: Jun Sun; +Cc: Kevin D. Kissell, linux-mips

On Thu, Jul 25, 2002 at 11:56:23AM -0700, Jun Sun wrote:
> Johannes Stezenbach wrote:
> >OTOH I doubt it's worth it to add the branch-likely hack to
> >stock glibc. How many people are using Linux/MIPS on embedded
> >CPU's without LL/SC?
> 
> There are probably more than you think.  The popuplar (and notorious) NEC 
> VR41xx family fall into this category.  I think at least one or two other 
> families of CPUs are like this too.

Ok, then maybe we should have /proc/sys/ entries where the kernel
tells glibc about CPU capabilities and kernel support for
userpace atomic operations, like:

- no /proc/sys/mips/* : use ll/sc (maybe emulated)
- have /proc/sys/mips/mips2-without-llsc: use branch-likely, read
  to get MAGIC_COOKIE to use
- have /proc/sys/mips/sony-ps2: whatever
- ...

Or use a sysmips() call to get the information.


Johannes

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: LL/SC benchmarking [was: Mipsel libc with LL/SC online anywhere?]
  2002-07-25 18:45       ` Johannes Stezenbach
  2002-07-25 18:56         ` Jun Sun
@ 2002-07-25 21:49         ` Kevin D. Kissell
  2002-07-26 19:35           ` Kevin D. Kissell
  1 sibling, 1 reply; 12+ messages in thread
From: Kevin D. Kissell @ 2002-07-25 21:49 UTC (permalink / raw)
  To: Johannes Stezenbach; +Cc: Jun Sun, linux-mips

Johannes Stezenbach wrote:
> 
> On Thu, Jul 25, 2002 at 10:06:55AM -0700, Jun Sun wrote:
> > Johannes Stezenbach wrote:
> > >sysmips:
> > >        real    1m19.358s
> > >        user    0m28.150s
> > >        sys     0m47.250s
> > >
> > >LL/SC emulation:
> > >        real    0m41.246s
> > >        user    0m25.390s
> > >        sys     0m12.240s
> > >
> > >branch-likely hack (hm, still without kernel patch...):
> > >        real    0m25.126s
> > >        user    0m17.240s
> > >        sys     0m2.310s
> >
> > Johannes,
> >
> > This is great stuff!  Can you explain what are "real", "user", and "sys"?
> > Also, what is your initial conclusion?
> 
> This are results from simple 'time ./testapp' testing, so its real time
> and user/system time reported by wait(4).
> 
> Also, I have an interactive gtk+directfb applicaton running. The
> difference in response time is quite noticable.
> 
> On reason for the big differences is that the Glib-2.0/GObject library
> does a lot of locking in its internal type system for every object
> created. Other software might not suffer as badly from a slow mutex
> implementation.
> 
> My conclusion is that it is good for glibc to always use ll/sc,
> emulated or not, and for my specific needs I will use the branch-likely
> hack. So next I will study kernel source to decide what MAGIC_COOKIE
> is best for the branch-likely hack, and where to add 'move k1,$0'
> before eret.

I am convinced that there is a value, quite possibly 0xffdadaff,
which can provably never be in k1 at the return from an exception
in a sane system - but it would be tedious to prove, and the
assumption could very easily be perturbed. I think
that adding overhead to the TLB refill handler would be
highly undesirable, but fortunately the TLB refill handler
is one of those cases where we can be sure that members
of a set of values (including 0xffdadaff) could not be
in k1 unless the system was about to crash in any case.
The prudent thing to do would be to load the MAGIC_COOKIE
value explicitly into k1 on the way out of general exception
service.  Fortunately, it looks to me as if at least half
of the overhead of this operation (LUI/ORI) can be concealed
in branch/jump delay slots that are currently going unfilled.

> OTOH I doubt it's worth it to add the branch-likely hack to
> stock glibc. How many people are using Linux/MIPS on embedded
> CPU's without LL/SC?

MIPSII-but-no-LL/SC CPUs are certainly the minority as far
as distinct designs and part numbers are concerned, but
I suspect they provide the overwhelming majority of actual
MIPS/Linux platforms in use, since both NEC Vr41xx-based 
handhelds and Sony PlayStation 2's fall into that category.  
Someone else may have better statistics than I do, though.

		Regards,

		Kevin K.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: LL/SC benchmarking [was: Mipsel libc with LL/SC online anywhere?]
  2002-07-25 21:49         ` Kevin D. Kissell
@ 2002-07-26 19:35           ` Kevin D. Kissell
  0 siblings, 0 replies; 12+ messages in thread
From: Kevin D. Kissell @ 2002-07-26 19:35 UTC (permalink / raw)
  To: Johannes Stezenbach, Jun Sun, linux-mips

"Kevin D. Kissell" wrote:
> The prudent thing to do would be to load the MAGIC_COOKIE
> value explicitly into k1 on the way out of general exception
> service.  Fortunately, it looks to me as if at least half
> of the overhead of this operation (LUI/ORI) can be concealed
> in branch/jump delay slots that are currently going unfilled.

I was distracted in the middle of that reply and got confused.
The MAGIC_COOKIE needs only to be destroyed, which can be 
done in a single instruction.  The 100% safe approach would
be to insert a "move k1,zero" instruction before all ERETs,
including those generated by the RESTORE_ALL_AND_RET macro
expansion, but it should faster, if very slightly larger 
and somewhat more burdensome for maintenence, to plant those
instructions in branch delay slots just "upstream" from the
context restore.  I'm working on the code a bit and may
be able to propose (if not test :-) a patch along these
lines, but in looking at entry.S, I note something a bit
disturbing:

There's a lot of code in there that allows the assembler
to schedule the instructions, but which also contains
SSNOPs to force timing.  Isn't that a bit dangerous?
Unless it is specified that the assembler will refuse
to reschedule SSNOPs, I think those sequences need to
be bracketed with .noreorder directives.  That would
also allow k1 destructors to be placed explicitly in
the delay slots, rather than assuming that they will
be put there by the assembler.

	Regards,

	Kevin K.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2002-07-26 19:34 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-07-12 13:04 Mipsel libc with LL/SC online anywhere? Kevin D. Kissell
2002-07-12 13:04 ` Kevin D. Kissell
2002-07-19 12:38 ` LL/SC benchmarking [was: Mipsel libc with LL/SC online anywhere?] Johannes Stezenbach
2002-07-19 15:54   ` Richard Hodges
2002-07-22 10:35     ` Johannes Stezenbach
2002-07-25 16:25   ` Johannes Stezenbach
2002-07-25 17:06     ` Jun Sun
2002-07-25 18:45       ` Johannes Stezenbach
2002-07-25 18:56         ` Jun Sun
2002-07-25 19:24           ` Johannes Stezenbach
2002-07-25 21:49         ` Kevin D. Kissell
2002-07-26 19:35           ` Kevin D. Kissell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.