All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] parisc: Improve LWS-CAS performance
@ 2014-05-11 22:40 John David Anglin
  2014-05-12  6:11 ` Rolf Eike Beer
  2014-05-14  3:31 ` Carlos O'Donell
  0 siblings, 2 replies; 8+ messages in thread
From: John David Anglin @ 2014-05-11 22:40 UTC (permalink / raw)
  To: linux-parisc List; +Cc: Helge Deller, James Bottomley

[-- Attachment #1: Type: text/plain, Size: 1218 bytes --]

The attached change significantly improves the performance of the LWS- 
CAS code in syscall.S.
This allows a number of packages to build (e.g., zeromq3, gtest and  
libxs) that previously failed because
slow LWS-CAS performance under contention.  In particular, interrupts  
taken while the lock was taken
degraded performance significantly.

The change does the following:

1) Disables interrupts around the CAS operation,
2) Removes the lock code on UP systems where it is not needed, and
3) Changes the loads and stores to use the ordered completer, "o", on
PA 2.0.  "o" and "ma" with a zero offset are equivalent.  The latter  
is accepted
on both PA 1.X and 2.0.

The use of ordered loads and stores probably makes no difference on  
all existing
hardware, but it seemed pedantically correct.  In particular, the CAS  
operation must
complete before LDCW lock is released.  As written before, a processor  
could reorder
the operations.

I don't believe the period interrupts are disabled is long enough to  
significantly
increase interrupt latency.  For example, the TLB insert code is  
longer.  Worst
case is a memory fault in the CAS operation.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
---



[-- Attachment #2: syscall.S.d.txt --]
[-- Type: text/plain, Size: 1337 bytes --]

diff --git a/arch/parisc/kernel/syscall.S b/arch/parisc/kernel/syscall.S
index a63bb179..801ba08 100644
--- a/arch/parisc/kernel/syscall.S
+++ b/arch/parisc/kernel/syscall.S
@@ -589,12 +589,16 @@ cas_nocontend:
 # endif
 /* ENABLE_LWS_DEBUG */
 
+	rsm	PSW_SM_I, %r0				/* Disable interrupts */
+#ifdef CONFIG_SMP
 	LDCW	0(%sr2,%r20), %r28			/* Try to acquire the lock */
 	cmpb,<>,n	%r0, %r28, cas_action		/* Did we get it? */
 cas_wouldblock:
 	ldo	2(%r0), %r28				/* 2nd case */
+	ssm	PSW_SM_I, %r0
 	b	lws_exit				/* Contended... */
 	ldo	-EAGAIN(%r0), %r21			/* Spin in userspace */
+#endif
 
 	/*
 		prev = *addr;
@@ -619,15 +623,19 @@ cas_action:
 	stw	%r1, 4(%sr2,%r20)
 #endif
 	/* The load and store could fail */
-1:	ldw	0(%sr3,%r26), %r28
+1:	ldw,ma	0(%sr3,%r26), %r28
 	sub,<>	%r28, %r25, %r0
-2:	stw	%r24, 0(%sr3,%r26)
+2:	stw,ma	%r24, 0(%sr3,%r26)
+#ifdef CONFIG_SMP
 	/* Free lock */
-	stw	%r20, 0(%sr2,%r20)
+	stw,ma	%r20, 0(%sr2,%r20)
+#endif
 #if ENABLE_LWS_DEBUG
 	/* Clear thread register indicator */
 	stw	%r0, 4(%sr2,%r20)
 #endif
+	/* Enable interrupts */
+	ssm	PSW_SM_I, %r0
 	/* Return to userspace, set no error */
 	b	lws_exit
 	copy	%r0, %r21
@@ -639,6 +647,7 @@ cas_action:
 #if ENABLE_LWS_DEBUG
 	stw	%r0, 4(%sr2,%r20)
 #endif
+	ssm	PSW_SM_I, %r0
 	b	lws_exit
 	ldo	-EFAULT(%r0),%r21	/* set errno */
 	nop

[-- Attachment #3: Type: text/plain, Size: 45 bytes --]



--
John David Anglin	dave.anglin@bell.net


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] parisc: Improve LWS-CAS performance
  2014-05-11 22:40 [PATCH] parisc: Improve LWS-CAS performance John David Anglin
@ 2014-05-12  6:11 ` Rolf Eike Beer
  2014-05-12 14:33   ` John David Anglin
  2014-05-14  3:31 ` Carlos O'Donell
  1 sibling, 1 reply; 8+ messages in thread
From: Rolf Eike Beer @ 2014-05-12  6:11 UTC (permalink / raw)
  To: John David Anglin; +Cc: linux-parisc List, Helge Deller, James Bottomley

[-- Attachment #1: Type: text/plain, Size: 592 bytes --]

Am Sonntag, 11. Mai 2014, 18:40:50 schrieb John David Anglin:
> The attached change significantly improves the performance of the LWS-
> CAS code in syscall.S.
> This allows a number of packages to build (e.g., zeromq3, gtest and
> libxs) that previously failed because
> slow LWS-CAS performance under contention.  In particular, interrupts
> taken while the lock was taken
> degraded performance significantly.

Sounds cool ;)

The label cas_wouldblock would now only be defined in SMP case, which looks 
odd. Since I can't find any references to it I think it should just be dropped.

Eike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] parisc: Improve LWS-CAS performance
  2014-05-12  6:11 ` Rolf Eike Beer
@ 2014-05-12 14:33   ` John David Anglin
  2014-05-12 15:04     ` Aw: " Helge Deller
  0 siblings, 1 reply; 8+ messages in thread
From: John David Anglin @ 2014-05-12 14:33 UTC (permalink / raw)
  To: Rolf Eike Beer; +Cc: linux-parisc List, Helge Deller, James Bottomley

On 5/12/2014 2:11 AM, Rolf Eike Beer wrote:
> Am Sonntag, 11. Mai 2014, 18:40:50 schrieb John David Anglin:
>> The attached change significantly improves the performance of the LWS-
>> CAS code in syscall.S.
>> This allows a number of packages to build (e.g., zeromq3, gtest and
>> libxs) that previously failed because
>> slow LWS-CAS performance under contention.  In particular, interrupts
>> taken while the lock was taken
>> degraded performance significantly.
> Sounds cool ;)
>
> The label cas_wouldblock would now only be defined in SMP case, which looks
> odd. Since I can't find any references to it I think it should just be dropped.
I was aware when I sent the change that the label cas_wouldblock is not 
used.  Possibly, it should
change to a comment but maybe that's a separate cleanup.

Dave

-- 
John David Anglin    dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Aw: Re: [PATCH] parisc: Improve LWS-CAS performance
  2014-05-12 14:33   ` John David Anglin
@ 2014-05-12 15:04     ` Helge Deller
  0 siblings, 0 replies; 8+ messages in thread
From: Helge Deller @ 2014-05-12 15:04 UTC (permalink / raw)
  To: John David Anglin; +Cc: Rolf Eike Beer, linux-parisc List, James Bottomley

> On 5/12/2014 2:11 AM, Rolf Eike Beer wrote:
> > Am Sonntag, 11. Mai 2014, 18:40:50 schrieb John David Anglin:
> >> The attached change significantly improves the performance of the LWS-
> >> CAS code in syscall.S.
> >> This allows a number of packages to build (e.g., zeromq3, gtest and
> >> libxs) that previously failed because
> >> slow LWS-CAS performance under contention.  In particular, interrupts
> >> taken while the lock was taken
> >> degraded performance significantly.
> > Sounds cool ;)
> >
> > The label cas_wouldblock would now only be defined in SMP case, which looks
> > odd. Since I can't find any references to it I think it should just be dropped.

> I was aware when I sent the change that the label cas_wouldblock is not 
> used.  Possibly, it should change to a comment but maybe that's a separate cleanup.

I left it in when I pushed your patch to my git tree:
http://git.kernel.org/cgit/linux/kernel/git/deller/parisc-linux.git/log/?h=for-next

I think it makes sense to leave it in because it makes this piece clear.
In addition I added CC to push it backwards into kernels 3.13 and 3.14 too.

Helge

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] parisc: Improve LWS-CAS performance
  2014-05-11 22:40 [PATCH] parisc: Improve LWS-CAS performance John David Anglin
  2014-05-12  6:11 ` Rolf Eike Beer
@ 2014-05-14  3:31 ` Carlos O'Donell
  2014-05-14 19:12   ` Helge Deller
  2014-05-15 12:33   ` [PATCH] parisc: Improve LWS-CAS performance (take 2) John David Anglin
  1 sibling, 2 replies; 8+ messages in thread
From: Carlos O'Donell @ 2014-05-14  3:31 UTC (permalink / raw)
  To: John David Anglin; +Cc: linux-parisc List, Helge Deller, James Bottomley

On Sun, May 11, 2014 at 6:40 PM, John David Anglin <dave.anglin@bell.net> wrote:
> The attached change significantly improves the performance of the LWS-CAS
> code in syscall.S.
> This allows a number of packages to build (e.g., zeromq3, gtest and libxs)
> that previously failed because
> slow LWS-CAS performance under contention.  In particular, interrupts taken
> while the lock was taken
> degraded performance significantly.
>
> The change does the following:
>
> 1) Disables interrupts around the CAS operation,

OK.

> 2) Removes the lock code on UP systems where it is not needed, and

This is wrong. The locking was originally only for SMP, but at one
point it was discovered that you could COW during the CAS operation
and if you didn't take the lock in UP, you could re-enter the CAS via
another task and thus incorrectly carry out the CAS in two different
threads and corrupt the state. While we attempt our best to prevent
the task running the CAS from being scheduled off, the COW operation
trap and subsequent scheduling is not something I could find an easy
way to prevent. Therefore the solution was to use locks for both UP
and SMP.

> 3) Changes the loads and stores to use the ordered completer, "o", on
> PA 2.0.  "o" and "ma" with a zero offset are equivalent.  The latter is
> accepted
> on both PA 1.X and 2.0.
>
> The use of ordered loads and stores probably makes no difference on all
> existing
> hardware, but it seemed pedantically correct.  In particular, the CAS
> operation must
> complete before LDCW lock is released.  As written before, a processor could
> reorder
> the operations.

Agreed.

> I don't believe the period interrupts are disabled is long enough to
> significantly
> increase interrupt latency.  For example, the TLB insert code is longer.
> Worst
> case is a memory fault in the CAS operation.

The worst case is actually a COW during the write of the CAS? The
interrupts would be disabled until the kernel turned them back on in
another code path.


Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] parisc: Improve LWS-CAS performance
  2014-05-14  3:31 ` Carlos O'Donell
@ 2014-05-14 19:12   ` Helge Deller
  2014-05-14 19:28     ` John David Anglin
  2014-05-15 12:33   ` [PATCH] parisc: Improve LWS-CAS performance (take 2) John David Anglin
  1 sibling, 1 reply; 8+ messages in thread
From: Helge Deller @ 2014-05-14 19:12 UTC (permalink / raw)
  To: Carlos O'Donell, John David Anglin; +Cc: linux-parisc List, James Bottomley

On 05/14/2014 05:31 AM, Carlos O'Donell wrote:
> On Sun, May 11, 2014 at 6:40 PM, John David Anglin <dave.anglin@bell.net> wrote:
>> 2) Removes the lock code on UP systems where it is not needed, and
> 
> This is wrong. The locking was originally only for SMP, but at one
> point it was discovered that you could COW during the CAS operation
> and if you didn't take the lock in UP, you could re-enter the CAS via
> another task and thus incorrectly carry out the CAS in two different
> threads and corrupt the state. While we attempt our best to prevent
> the task running the CAS from being scheduled off, the COW operation
> trap and subsequent scheduling is not something I could find an easy
> way to prevent. Therefore the solution was to use locks for both UP
> and SMP.

If so, could one of you please send an updated patch to the list...?
Currently I've scheduled http://git.kernel.org/cgit/linux/kernel/git/deller/parisc-linux.git/commit/?h=for-next&id=f4030047d92c79344ac4ca6ad4c6e3e22ceb5a55
for inclusion into the next kernel (hopefully 3.15).

Helge

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] parisc: Improve LWS-CAS performance
  2014-05-14 19:12   ` Helge Deller
@ 2014-05-14 19:28     ` John David Anglin
  0 siblings, 0 replies; 8+ messages in thread
From: John David Anglin @ 2014-05-14 19:28 UTC (permalink / raw)
  To: Helge Deller, Carlos O'Donell; +Cc: linux-parisc List, James Bottomley

On 5/14/2014 3:12 PM, Helge Deller wrote:
> On 05/14/2014 05:31 AM, Carlos O'Donell wrote:
>> On Sun, May 11, 2014 at 6:40 PM, John David Anglin <dave.anglin@bell.net> wrote:
>>> 2) Removes the lock code on UP systems where it is not needed, and
>> This is wrong. The locking was originally only for SMP, but at one
>> point it was discovered that you could COW during the CAS operation
>> and if you didn't take the lock in UP, you could re-enter the CAS via
>> another task and thus incorrectly carry out the CAS in two different
>> threads and corrupt the state. While we attempt our best to prevent
>> the task running the CAS from being scheduled off, the COW operation
>> trap and subsequent scheduling is not something I could find an easy
>> way to prevent. Therefore the solution was to use locks for both UP
>> and SMP.
> If so, could one of you please send an updated patch to the list...?
> Currently I've scheduled http://git.kernel.org/cgit/linux/kernel/git/deller/parisc-linux.git/commit/?h=for-next&id=f4030047d92c79344ac4ca6ad4c6e3e22ceb5a55
> for inclusion into the next kernel (hopefully 3.15).
I agree that a COW could occur but I haven't seen evidence that threads are
getting rescheduled.  We did fix some code in the fault handling a few 
months
ago.  Anyway, I have no strong objection to removing ifdefs and 
resending after
testing.

Dave

-- 
John David Anglin    dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] parisc: Improve LWS-CAS performance (take 2)
  2014-05-14  3:31 ` Carlos O'Donell
  2014-05-14 19:12   ` Helge Deller
@ 2014-05-15 12:33   ` John David Anglin
  1 sibling, 0 replies; 8+ messages in thread
From: John David Anglin @ 2014-05-15 12:33 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: linux-parisc List, Helge Deller, James Bottomley

[-- Attachment #1: Type: text/plain, Size: 732 bytes --]

On 13-May-14, at 11:31 PM, Carlos O'Donell wrote:

>> 2) Removes the lock code on UP systems where it is not needed, and
>
> This is wrong. The locking was originally only for SMP, but at one
> point it was discovered that you could COW during the CAS operation
> and if you didn't take the lock in UP, you could re-enter the CAS via
> another task and thus incorrectly carry out the CAS in two different
> threads and corrupt the state.

The attached file updates the change so that the lock code is no  
longer removed
on UP systems.  I added a comment to indicate that contention can  
occur even on
UP systems.

Lightly tested with a build of kyotocabinet on c3750.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
---


[-- Attachment #2: syscall.S.d.1.txt --]
[-- Type: text/plain, Size: 1315 bytes --]

diff --git a/arch/parisc/kernel/syscall.S b/arch/parisc/kernel/syscall.S
index a63bb179..8387860 100644
--- a/arch/parisc/kernel/syscall.S
+++ b/arch/parisc/kernel/syscall.S
@@ -589,10 +589,13 @@ cas_nocontend:
 # endif
 /* ENABLE_LWS_DEBUG */
 
+	rsm	PSW_SM_I, %r0				/* Disable interrupts */
+	/* COW breaks can cause contention on UP systems */
 	LDCW	0(%sr2,%r20), %r28			/* Try to acquire the lock */
 	cmpb,<>,n	%r0, %r28, cas_action		/* Did we get it? */
 cas_wouldblock:
 	ldo	2(%r0), %r28				/* 2nd case */
+	ssm	PSW_SM_I, %r0
 	b	lws_exit				/* Contended... */
 	ldo	-EAGAIN(%r0), %r21			/* Spin in userspace */
 
@@ -619,15 +622,17 @@ cas_action:
 	stw	%r1, 4(%sr2,%r20)
 #endif
 	/* The load and store could fail */
-1:	ldw	0(%sr3,%r26), %r28
+1:	ldw,ma	0(%sr3,%r26), %r28
 	sub,<>	%r28, %r25, %r0
-2:	stw	%r24, 0(%sr3,%r26)
+2:	stw,ma	%r24, 0(%sr3,%r26)
 	/* Free lock */
-	stw	%r20, 0(%sr2,%r20)
+	stw,ma	%r20, 0(%sr2,%r20)
 #if ENABLE_LWS_DEBUG
 	/* Clear thread register indicator */
 	stw	%r0, 4(%sr2,%r20)
 #endif
+	/* Enable interrupts */
+	ssm	PSW_SM_I, %r0
 	/* Return to userspace, set no error */
 	b	lws_exit
 	copy	%r0, %r21
@@ -639,6 +644,7 @@ cas_action:
 #if ENABLE_LWS_DEBUG
 	stw	%r0, 4(%sr2,%r20)
 #endif
+	ssm	PSW_SM_I, %r0
 	b	lws_exit
 	ldo	-EFAULT(%r0),%r21	/* set errno */
 	nop

[-- Attachment #3: Type: text/plain, Size: 45 bytes --]



--
John David Anglin	dave.anglin@bell.net


^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-05-15 12:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-11 22:40 [PATCH] parisc: Improve LWS-CAS performance John David Anglin
2014-05-12  6:11 ` Rolf Eike Beer
2014-05-12 14:33   ` John David Anglin
2014-05-12 15:04     ` Aw: " Helge Deller
2014-05-14  3:31 ` Carlos O'Donell
2014-05-14 19:12   ` Helge Deller
2014-05-14 19:28     ` John David Anglin
2014-05-15 12:33   ` [PATCH] parisc: Improve LWS-CAS performance (take 2) John David Anglin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.