All of lore.kernel.org
 help / color / mirror / Atom feed
* Serious problem with ticket spinlocks on ia64
@ 2010-08-27 13:37 ` Petr Tesarik
  0 siblings, 0 replies; 72+ messages in thread
From: Petr Tesarik @ 2010-08-27 13:37 UTC (permalink / raw)
  To: linux-ia64; +Cc: linux-kernel, Tony Luck, Hedi Berriche

Hi everybody,

SGI has recently experienced failures with the new ticket spinlock
implementation. Hedi Berriche sent me a simple test case that can
trigger the failure on the siglock. To debug the issue, I wrote a
small module that watches writes to current->sighand->siglock and
records the values.

I observed that the __ticket_spin_lock() primitive fails when the
tail wraps around to zero. I reconstructed the following:

CPU 7 holds the spinlock
CPU 5 wants to acquire the spinlock
Spinlock value is 0xfffcffff
	 (now serving 0x7fffe, next ticket 0x7ffff)

CPU 7 executes st2.rel to release the spinlock.
At the same time CPU 5 executes a fetchadd4.acq.
The resulting lock value is 0xfffe0000 (correct), and CPU 5 has
recorded its ticket number (0x7fff).

Consequently, the first spinlock loop iteration succeeds, and CPU 5
now holds the spinlock.

Next, CPU 5 releases the spinlock with st2.rel, changing the lock
value to 0x0 (correct).

SO FAR SO GOOD.

Now, CPU 4, CPU 5 and CPU 7 all want to acquire the lock again.
Interestingly, CPU 5 and CPU 7 are both granted the same ticket,
and the spinlock value (as seen from the debug fault handler) is
0x0 after single-stepping over the fetchadd4.acq, in both cases.
CPU 4 correctly sets the spinlock value to 0x1.

I don't know if the simultaneos acquire attempt and release are
necessary to trigger the bug, but I noted it here.

I've only seen this happen when the spinlock wraps around to zero,
but I don't know whether it cannot happen otherwise.

In any case, there seems to be a serious problem with memory
ordering, and I'm not an expert to tell exactly what it is.

Any ideas?

Petr Tesarik
L3 International
Novell, Inc.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Serious problem with ticket spinlocks on ia64
@ 2010-08-27 13:37 ` Petr Tesarik
  0 siblings, 0 replies; 72+ messages in thread
From: Petr Tesarik @ 2010-08-27 13:37 UTC (permalink / raw)
  To: linux-ia64; +Cc: linux-kernel, Tony Luck, Hedi Berriche

Hi everybody,

SGI has recently experienced failures with the new ticket spinlock
implementation. Hedi Berriche sent me a simple test case that can
trigger the failure on the siglock. To debug the issue, I wrote a
small module that watches writes to current->sighand->siglock and
records the values.

I observed that the __ticket_spin_lock() primitive fails when the
tail wraps around to zero. I reconstructed the following:

CPU 7 holds the spinlock
CPU 5 wants to acquire the spinlock
Spinlock value is 0xfffcffff
	 (now serving 0x7fffe, next ticket 0x7ffff)

CPU 7 executes st2.rel to release the spinlock.
At the same time CPU 5 executes a fetchadd4.acq.
The resulting lock value is 0xfffe0000 (correct), and CPU 5 has
recorded its ticket number (0x7fff).

Consequently, the first spinlock loop iteration succeeds, and CPU 5
now holds the spinlock.

Next, CPU 5 releases the spinlock with st2.rel, changing the lock
value to 0x0 (correct).

SO FAR SO GOOD.

Now, CPU 4, CPU 5 and CPU 7 all want to acquire the lock again.
Interestingly, CPU 5 and CPU 7 are both granted the same ticket,
and the spinlock value (as seen from the debug fault handler) is
0x0 after single-stepping over the fetchadd4.acq, in both cases.
CPU 4 correctly sets the spinlock value to 0x1.

I don't know if the simultaneos acquire attempt and release are
necessary to trigger the bug, but I noted it here.

I've only seen this happen when the spinlock wraps around to zero,
but I don't know whether it cannot happen otherwise.

In any case, there seems to be a serious problem with memory
ordering, and I'm not an expert to tell exactly what it is.

Any ideas?

Petr Tesarik
L3 International
Novell, Inc.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
  2010-08-27 13:37 ` Petr Tesarik
@ 2010-08-27 13:48   ` Hedi Berriche
  -1 siblings, 0 replies; 72+ messages in thread
From: Hedi Berriche @ 2010-08-27 13:48 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: linux-ia64, linux-kernel, Tony Luck

On Fri, Aug 27, 2010 at 14:38 Petr Tesarik wrote:
| Hi everybody,
| 
| SGI has recently experienced failures with the new ticket spinlock
| implementation. Hedi Berriche sent me a simple test case that can
| trigger the failure on the siglock.

One more fact, the problem was introduced by commit

    commit 9d40ee200a527ce08ab8c793ba8ae3e242edbb0e
    Author: Tony Luck <tony.luck@intel.com>
    Date:   Wed Oct 7 10:54:19 2009 -0700

        [IA64] Squeeze ticket locks back into 4 bytes.

Reverting the patch makes the problem go away.

IOW, and as far as testing shows, the first incarnation of the ticket locks
implementation on IA64 (commit 2c8696), the one that used 8 bytes, does not
exhibit this problem.

Cheers,
Hedi.
-- 
Hedi Berriche
Global Product Support
http://www.sgi.com/support

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
@ 2010-08-27 13:48   ` Hedi Berriche
  0 siblings, 0 replies; 72+ messages in thread
From: Hedi Berriche @ 2010-08-27 13:48 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: linux-ia64, linux-kernel, Tony Luck

On Fri, Aug 27, 2010 at 14:38 Petr Tesarik wrote:
| Hi everybody,
| 
| SGI has recently experienced failures with the new ticket spinlock
| implementation. Hedi Berriche sent me a simple test case that can
| trigger the failure on the siglock.

One more fact, the problem was introduced by commit

    commit 9d40ee200a527ce08ab8c793ba8ae3e242edbb0e
    Author: Tony Luck <tony.luck@intel.com>
    Date:   Wed Oct 7 10:54:19 2009 -0700

        [IA64] Squeeze ticket locks back into 4 bytes.

Reverting the patch makes the problem go away.

IOW, and as far as testing shows, the first incarnation of the ticket locks
implementation on IA64 (commit 2c8696), the one that used 8 bytes, does not
exhibit this problem.

Cheers,
Hedi.
-- 
Hedi Berriche
Global Product Support
http://www.sgi.com/support

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
  2010-08-27 13:48   ` Hedi Berriche
@ 2010-08-27 14:09     ` Petr Tesarik
  -1 siblings, 0 replies; 72+ messages in thread
From: Petr Tesarik @ 2010-08-27 14:09 UTC (permalink / raw)
  To: Hedi Berriche; +Cc: linux-ia64, linux-kernel, Tony Luck

On Friday 27 of August 2010 15:48:02 Hedi Berriche wrote:
> On Fri, Aug 27, 2010 at 14:38 Petr Tesarik wrote:
> | Hi everybody,
> |
> | SGI has recently experienced failures with the new ticket spinlock
> | implementation. Hedi Berriche sent me a simple test case that can
> | trigger the failure on the siglock.
>
> One more fact, the problem was introduced by commit
>
>     commit 9d40ee200a527ce08ab8c793ba8ae3e242edbb0e
>     Author: Tony Luck <tony.luck@intel.com>
>     Date:   Wed Oct 7 10:54:19 2009 -0700
>
>         [IA64] Squeeze ticket locks back into 4 bytes.
>
> Reverting the patch makes the problem go away.
>
> IOW, and as far as testing shows, the first incarnation of the ticket locks
> implementation on IA64 (commit 2c8696), the one that used 8 bytes, does not
> exhibit this problem.

I wouldn't be so sure about it. Given that I have only observed the problem 
when the spinlock value wraps around, then an 8-byte spinlock might only need 
much more time to trigger the bug.

Just my two cents,
Petr Tesarik
L3 International
Novell, Inc.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
@ 2010-08-27 14:09     ` Petr Tesarik
  0 siblings, 0 replies; 72+ messages in thread
From: Petr Tesarik @ 2010-08-27 14:09 UTC (permalink / raw)
  To: Hedi Berriche; +Cc: linux-ia64, linux-kernel, Tony Luck

On Friday 27 of August 2010 15:48:02 Hedi Berriche wrote:
> On Fri, Aug 27, 2010 at 14:38 Petr Tesarik wrote:
> | Hi everybody,
> |
> | SGI has recently experienced failures with the new ticket spinlock
> | implementation. Hedi Berriche sent me a simple test case that can
> | trigger the failure on the siglock.
>
> One more fact, the problem was introduced by commit
>
>     commit 9d40ee200a527ce08ab8c793ba8ae3e242edbb0e
>     Author: Tony Luck <tony.luck@intel.com>
>     Date:   Wed Oct 7 10:54:19 2009 -0700
>
>         [IA64] Squeeze ticket locks back into 4 bytes.
>
> Reverting the patch makes the problem go away.
>
> IOW, and as far as testing shows, the first incarnation of the ticket locks
> implementation on IA64 (commit 2c8696), the one that used 8 bytes, does not
> exhibit this problem.

I wouldn't be so sure about it. Given that I have only observed the problem 
when the spinlock value wraps around, then an 8-byte spinlock might only need 
much more time to trigger the bug.

Just my two cents,
Petr Tesarik
L3 International
Novell, Inc.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
  2010-08-27 14:09     ` Petr Tesarik
@ 2010-08-27 14:31       ` Hedi Berriche
  -1 siblings, 0 replies; 72+ messages in thread
From: Hedi Berriche @ 2010-08-27 14:31 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: linux-ia64, linux-kernel, Tony Luck

On Fri, Aug 27, 2010 at 15:09 Petr Tesarik wrote:
| On Friday 27 of August 2010 15:48:02 Hedi Berriche wrote:
|
| > One more fact, the problem was introduced by commit
| >
| >     commit 9d40ee200a527ce08ab8c793ba8ae3e242edbb0e
| >     Author: Tony Luck <tony.luck@intel.com>
| >     Date:   Wed Oct 7 10:54:19 2009 -0700
| >
| >         [IA64] Squeeze ticket locks back into 4 bytes.
| >
| > Reverting the patch makes the problem go away.
| >
| > IOW, and as far as testing shows, the first incarnation of the ticket locks
| > implementation on IA64 (commit 2c8696), the one that used 8 bytes, does not
| > exhibit this problem.
| 
| I wouldn't be so sure about it. Given that I have only observed the problem 
| when the spinlock value wraps around, then an 8-byte spinlock might only need 
| much more time to trigger the bug.

That's a possibility and that's why I said "as far as testing shows".

That said, I'm letting my already over 36 hours run carry on chewing CPU
time, and see if it will eventually trip the same problem seen with 4-byte
ticket locks.

Cheers,
Hedi.
-- 
Hedi Berriche
Global Product Support
http://www.sgi.com/support

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
@ 2010-08-27 14:31       ` Hedi Berriche
  0 siblings, 0 replies; 72+ messages in thread
From: Hedi Berriche @ 2010-08-27 14:31 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: linux-ia64, linux-kernel, Tony Luck

On Fri, Aug 27, 2010 at 15:09 Petr Tesarik wrote:
| On Friday 27 of August 2010 15:48:02 Hedi Berriche wrote:
|
| > One more fact, the problem was introduced by commit
| >
| >     commit 9d40ee200a527ce08ab8c793ba8ae3e242edbb0e
| >     Author: Tony Luck <tony.luck@intel.com>
| >     Date:   Wed Oct 7 10:54:19 2009 -0700
| >
| >         [IA64] Squeeze ticket locks back into 4 bytes.
| >
| > Reverting the patch makes the problem go away.
| >
| > IOW, and as far as testing shows, the first incarnation of the ticket locks
| > implementation on IA64 (commit 2c8696), the one that used 8 bytes, does not
| > exhibit this problem.
| 
| I wouldn't be so sure about it. Given that I have only observed the problem 
| when the spinlock value wraps around, then an 8-byte spinlock might only need 
| much more time to trigger the bug.

That's a possibility and that's why I said "as far as testing shows".

That said, I'm letting my already over 36 hours run carry on chewing CPU
time, and see if it will eventually trip the same problem seen with 4-byte
ticket locks.

Cheers,
Hedi.
-- 
Hedi Berriche
Global Product Support
http://www.sgi.com/support

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
  2010-08-27 14:31       ` Hedi Berriche
@ 2010-08-27 14:40         ` Petr Tesarik
  -1 siblings, 0 replies; 72+ messages in thread
From: Petr Tesarik @ 2010-08-27 14:40 UTC (permalink / raw)
  To: Hedi Berriche; +Cc: linux-ia64, linux-kernel, Tony Luck

On Friday 27 of August 2010 16:31:35 Hedi Berriche wrote:
> On Fri, Aug 27, 2010 at 15:09 Petr Tesarik wrote:
> | On Friday 27 of August 2010 15:48:02 Hedi Berriche wrote:
> | > One more fact, the problem was introduced by commit
> | >
> | >     commit 9d40ee200a527ce08ab8c793ba8ae3e242edbb0e
> | >     Author: Tony Luck <tony.luck@intel.com>
> | >     Date:   Wed Oct 7 10:54:19 2009 -0700
> | >
> | >         [IA64] Squeeze ticket locks back into 4 bytes.
> | >
> | > Reverting the patch makes the problem go away.
> | >
> | > IOW, and as far as testing shows, the first incarnation of the ticket
> | > locks implementation on IA64 (commit 2c8696), the one that used 8
> | > bytes, does not exhibit this problem.
> |
> | I wouldn't be so sure about it. Given that I have only observed the
> | problem when the spinlock value wraps around, then an 8-byte spinlock
> | might only need much more time to trigger the bug.
>
> That's a possibility and that's why I said "as far as testing shows".
>
> That said, I'm letting my already over 36 hours run carry on chewing CPU
> time, and see if it will eventually trip the same problem seen with 4-byte
> ticket locks.

Hm, this doesn't sound like a viable approach. Since the siglock gets 
initialized to 0 when a new process is started, it may never actually wrap 
around.

I would rather attach a SystemTap probe somewhere during process fork and add 
a bias to the siglock. That should work fine. Let me knock up the SystemTap 
script...

Petr

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
@ 2010-08-27 14:40         ` Petr Tesarik
  0 siblings, 0 replies; 72+ messages in thread
From: Petr Tesarik @ 2010-08-27 14:40 UTC (permalink / raw)
  To: Hedi Berriche; +Cc: linux-ia64, linux-kernel, Tony Luck

On Friday 27 of August 2010 16:31:35 Hedi Berriche wrote:
> On Fri, Aug 27, 2010 at 15:09 Petr Tesarik wrote:
> | On Friday 27 of August 2010 15:48:02 Hedi Berriche wrote:
> | > One more fact, the problem was introduced by commit
> | >
> | >     commit 9d40ee200a527ce08ab8c793ba8ae3e242edbb0e
> | >     Author: Tony Luck <tony.luck@intel.com>
> | >     Date:   Wed Oct 7 10:54:19 2009 -0700
> | >
> | >         [IA64] Squeeze ticket locks back into 4 bytes.
> | >
> | > Reverting the patch makes the problem go away.
> | >
> | > IOW, and as far as testing shows, the first incarnation of the ticket
> | > locks implementation on IA64 (commit 2c8696), the one that used 8
> | > bytes, does not exhibit this problem.
> |
> | I wouldn't be so sure about it. Given that I have only observed the
> | problem when the spinlock value wraps around, then an 8-byte spinlock
> | might only need much more time to trigger the bug.
>
> That's a possibility and that's why I said "as far as testing shows".
>
> That said, I'm letting my already over 36 hours run carry on chewing CPU
> time, and see if it will eventually trip the same problem seen with 4-byte
> ticket locks.

Hm, this doesn't sound like a viable approach. Since the siglock gets 
initialized to 0 when a new process is started, it may never actually wrap 
around.

I would rather attach a SystemTap probe somewhere during process fork and add 
a bias to the siglock. That should work fine. Let me knock up the SystemTap 
script...

Petr

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
  2010-08-27 14:40         ` Petr Tesarik
@ 2010-08-27 14:52           ` Hedi Berriche
  -1 siblings, 0 replies; 72+ messages in thread
From: Hedi Berriche @ 2010-08-27 14:52 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: linux-ia64, linux-kernel, Tony Luck

On Fri, Aug 27, 2010 at 15:40 Petr Tesarik wrote:
| On Friday 27 of August 2010 16:31:35 Hedi Berriche wrote:
|
| > That said, I'm letting my already over 36 hours run carry on chewing CPU
| > time, and see if it will eventually trip the same problem seen with 4-byte
| > ticket locks.
| 
| Hm, this doesn't sound like a viable approach. Since the siglock gets 
| initialized to 0 when a new process is started, it may never actually wrap 
| around.
| 
| I would rather attach a SystemTap probe somewhere during process fork and add 
| a bias to the siglock. That should work fine. Let me knock up the SystemTap 
| script...

That would be nice. Ta!

Cheers,
Hedi.
-- 
Hedi Berriche
Global Product Support
http://www.sgi.com/support

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
@ 2010-08-27 14:52           ` Hedi Berriche
  0 siblings, 0 replies; 72+ messages in thread
From: Hedi Berriche @ 2010-08-27 14:52 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: linux-ia64, linux-kernel, Tony Luck

On Fri, Aug 27, 2010 at 15:40 Petr Tesarik wrote:
| On Friday 27 of August 2010 16:31:35 Hedi Berriche wrote:
|
| > That said, I'm letting my already over 36 hours run carry on chewing CPU
| > time, and see if it will eventually trip the same problem seen with 4-byte
| > ticket locks.
| 
| Hm, this doesn't sound like a viable approach. Since the siglock gets 
| initialized to 0 when a new process is started, it may never actually wrap 
| around.
| 
| I would rather attach a SystemTap probe somewhere during process fork and add 
| a bias to the siglock. That should work fine. Let me knock up the SystemTap 
| script...

That would be nice. Ta!

Cheers,
Hedi.
-- 
Hedi Berriche
Global Product Support
http://www.sgi.com/support

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: Serious problem with ticket spinlocks on ia64
  2010-08-27 13:37 ` Petr Tesarik
@ 2010-08-27 16:08   ` Luck, Tony
  -1 siblings, 0 replies; 72+ messages in thread
From: Luck, Tony @ 2010-08-27 16:08 UTC (permalink / raw)
  To: Petr Tesarik, linux-ia64; +Cc: linux-kernel, Hedi Berriche

> Hedi Berriche sent me a simple test case that can
> trigger the failure on the siglock.

Can you post the test case please. How long does it typically take
to reproduce the problem?


> Next, CPU 5 releases the spinlock with st2.rel, changing the lock
> value to 0x0 (correct).
>
> SO FAR SO GOOD.
>
> Now, CPU 4, CPU 5 and CPU 7 all want to acquire the lock again.
> Interestingly, CPU 5 and CPU 7 are both granted the same ticket,

What is the duplicate ticket number that CPUs 5 & 7 get at this point?
Presumably 0x0, yes? Or do they see a stale 0x7fff?

> and the spinlock value (as seen from the debug fault handler) is
> 0x0 after single-stepping over the fetchadd4.acq, in both cases.
> CPU 4 correctly sets the spinlock value to 0x1.

Is the fault handler using "ld.acq" to look at the spinlock value?
If not, then this might be a red herring. [Though clearly something
bad is going on here].

> Any ideas?

What cpu model are you running on?
What is the topological connection between CPU 4, 5 and 7 - are any of
them hyper-threaded siblings? Cores on same socket? N.B. topology may
change from boot to boot, so you may need to capture /proc/cpuinfo from
the same boot where this problem is detected. But the variation is
usually limited to which socket gets to own logical cpu 0.

If this is a memory ordering problem (and that seems quite plausible)
then a liberal sprinkling of "ia64_mf()" calls throughout the spinlock
routines would probably make it go away.

-Tony

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: Serious problem with ticket spinlocks on ia64
@ 2010-08-27 16:08   ` Luck, Tony
  0 siblings, 0 replies; 72+ messages in thread
From: Luck, Tony @ 2010-08-27 16:08 UTC (permalink / raw)
  To: Petr Tesarik, linux-ia64; +Cc: linux-kernel, Hedi Berriche

> Hedi Berriche sent me a simple test case that can
> trigger the failure on the siglock.

Can you post the test case please. How long does it typically take
to reproduce the problem?


> Next, CPU 5 releases the spinlock with st2.rel, changing the lock
> value to 0x0 (correct).
>
> SO FAR SO GOOD.
>
> Now, CPU 4, CPU 5 and CPU 7 all want to acquire the lock again.
> Interestingly, CPU 5 and CPU 7 are both granted the same ticket,

What is the duplicate ticket number that CPUs 5 & 7 get at this point?
Presumably 0x0, yes? Or do they see a stale 0x7fff?

> and the spinlock value (as seen from the debug fault handler) is
> 0x0 after single-stepping over the fetchadd4.acq, in both cases.
> CPU 4 correctly sets the spinlock value to 0x1.

Is the fault handler using "ld.acq" to look at the spinlock value?
If not, then this might be a red herring. [Though clearly something
bad is going on here].

> Any ideas?

What cpu model are you running on?
What is the topological connection between CPU 4, 5 and 7 - are any of
them hyper-threaded siblings? Cores on same socket? N.B. topology may
change from boot to boot, so you may need to capture /proc/cpuinfo from
the same boot where this problem is detected. But the variation is
usually limited to which socket gets to own logical cpu 0.

If this is a memory ordering problem (and that seems quite plausible)
then a liberal sprinkling of "ia64_mf()" calls throughout the spinlock
routines would probably make it go away.

-Tony

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
  2010-08-27 14:52           ` Hedi Berriche
@ 2010-08-27 16:37             ` Petr Tesarik
  -1 siblings, 0 replies; 72+ messages in thread
From: Petr Tesarik @ 2010-08-27 16:37 UTC (permalink / raw)
  To: Hedi Berriche; +Cc: linux-ia64, linux-kernel, Tony Luck

[-- Attachment #1: Type: text/plain, Size: 893 bytes --]

On Friday 27 of August 2010 16:52:31 Hedi Berriche wrote:
> On Fri, Aug 27, 2010 at 15:40 Petr Tesarik wrote:
> | On Friday 27 of August 2010 16:31:35 Hedi Berriche wrote:
> | > That said, I'm letting my already over 36 hours run carry on chewing
> | > CPU time, and see if it will eventually trip the same problem seen with
> | > 4-byte ticket locks.
> |
> | Hm, this doesn't sound like a viable approach. Since the siglock gets
> | initialized to 0 when a new process is started, it may never actually
> | wrap around.
> |
> | I would rather attach a SystemTap probe somewhere during process fork and
> | add a bias to the siglock. That should work fine. Let me knock up the
> | SystemTap script...
>
> That would be nice. Ta!

Here it is. I don't have a system with the old 64-bit ticket spinlocks at 
hand, so this is completely untested, but it should work fine.

Adjust if needed.

Petr

[-- Attachment #2: biaslock.stp --]
[-- Type: text/x-csrc, Size: 455 bytes --]

/* Bias the spin lock on fork */

%{
#define LOCK_BIAS ((1ULL << 32) - (1ULL << 15))
%}

function bias_siglock (task:long)
%{
	struct task_struct *tsk = (struct task_struct*)THIS->task;
	tsk->sighand->siglock.raw_lock.lock = LOCK_BIAS + (LOCK_BIAS << 32);
%}

function is_err (value:long)
%{
	THIS->__retvalue = IS_ERR_VALUE((unsigned long)THIS->value);
%}

probe kernel.function("copy_process").return
{
	if (!is_err($return))
		bias_siglock($return);
}

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
@ 2010-08-27 16:37             ` Petr Tesarik
  0 siblings, 0 replies; 72+ messages in thread
From: Petr Tesarik @ 2010-08-27 16:37 UTC (permalink / raw)
  To: Hedi Berriche; +Cc: linux-ia64, linux-kernel, Tony Luck

[-- Attachment #1: Type: text/plain, Size: 893 bytes --]

On Friday 27 of August 2010 16:52:31 Hedi Berriche wrote:
> On Fri, Aug 27, 2010 at 15:40 Petr Tesarik wrote:
> | On Friday 27 of August 2010 16:31:35 Hedi Berriche wrote:
> | > That said, I'm letting my already over 36 hours run carry on chewing
> | > CPU time, and see if it will eventually trip the same problem seen with
> | > 4-byte ticket locks.
> |
> | Hm, this doesn't sound like a viable approach. Since the siglock gets
> | initialized to 0 when a new process is started, it may never actually
> | wrap around.
> |
> | I would rather attach a SystemTap probe somewhere during process fork and
> | add a bias to the siglock. That should work fine. Let me knock up the
> | SystemTap script...
>
> That would be nice. Ta!

Here it is. I don't have a system with the old 64-bit ticket spinlocks at 
hand, so this is completely untested, but it should work fine.

Adjust if needed.

Petr

[-- Attachment #2: biaslock.stp --]
[-- Type: text/x-csrc, Size: 455 bytes --]

/* Bias the spin lock on fork */

%{
#define LOCK_BIAS ((1ULL << 32) - (1ULL << 15))
%}

function bias_siglock (task:long)
%{
	struct task_struct *tsk = (struct task_struct*)THIS->task;
	tsk->sighand->siglock.raw_lock.lock = LOCK_BIAS + (LOCK_BIAS << 32);
%}

function is_err (value:long)
%{
	THIS->__retvalue = IS_ERR_VALUE((unsigned long)THIS->value);
%}

probe kernel.function("copy_process").return
{
	if (!is_err($return))
		bias_siglock($return);
}

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
  2010-08-27 16:08   ` Luck, Tony
@ 2010-08-27 17:16     ` Petr Tesarik
  -1 siblings, 0 replies; 72+ messages in thread
From: Petr Tesarik @ 2010-08-27 17:16 UTC (permalink / raw)
  To: Luck, Tony; +Cc: linux-ia64, linux-kernel, Hedi Berriche

On Friday 27 of August 2010 18:08:03 Luck, Tony wrote:
> > Hedi Berriche sent me a simple test case that can
> > trigger the failure on the siglock.
>
> Can you post the test case please. How long does it typically take
> to reproduce the problem?

I let Hedi send it. It's really easy to reproduce. In fact, I can reproduce it 
within 5 minutes on an 8-CPU system.

> > Next, CPU 5 releases the spinlock with st2.rel, changing the lock
> > value to 0x0 (correct).
> >
> > SO FAR SO GOOD.
> >
> > Now, CPU 4, CPU 5 and CPU 7 all want to acquire the lock again.
> > Interestingly, CPU 5 and CPU 7 are both granted the same ticket,
>
> What is the duplicate ticket number that CPUs 5 & 7 get at this point?
> Presumably 0x0, yes? Or do they see a stale 0x7fff?

They get a zero, yes.
> > and the spinlock value (as seen from the debug fault handler) is
> > 0x0 after single-stepping over the fetchadd4.acq, in both cases.
> > CPU 4 correctly sets the spinlock value to 0x1.
>
> Is the fault handler using "ld.acq" to look at the spinlock value?
> If not, then this might be a red herring. [Though clearly something
> bad is going on here].

Right. I also realized I was reading the spinlock value with a plain "ld4". 
When I changed it to "ld4.acq", this is what happens:

1. We're in _spin_lock_irq, which starts like this:

0xa0000001008ea000 <_spin_lock_irq>:    [MMI]       rsm 0x4000;;
0xa0000001008ea001 <_spin_lock_irq+1>:              fetchadd4.acq r15=[r32],1
0xa0000001008ea002 <_spin_lock_irq+2>:              nop.i 0x0;;

AFAICS the spinlock value should be 0x0 (after having wrapped around from 
0xffff0000 at release on the same CPU).

2. fetchadd4.acq generates a debug exception (because it writes to the watched 
location)
3. ld4.acq inside the debug fault handler reads 0x0 from the location
4. the handler sets PSR.ss on return
5. fetchadd4.acq puts 0x1 (why?) in r15 and generates a Single step fault
6. the fault handler now reads 0x0 (sic!) from the spinlock location (again, 
using ld4.acq)
7. the resulting kernel crash dump contains ZERO in the spinlock location

Maybe, there's something wrong with my test module, because I'm already 
getting tired today, but there's definitely something wrong here. I'll try to 
polish it and send here later.

> > Any ideas?
>
> What cpu model are you running on?
> What is the topological connection between CPU 4, 5 and 7 - are any of
> them hyper-threaded siblings? Cores on same socket? N.B. topology may
> change from boot to boot, so you may need to capture /proc/cpuinfo from
> the same boot where this problem is detected. But the variation is
> usually limited to which socket gets to own logical cpu 0.

There are two Dual-Core Intel(R) Itanium(R) 2 Processor 9150M in the test 
machine:

physical package 0
  core 0: CPU 0, CPU 4
  core 1: CPU 2, CPU 6

physical package 196611
  core 0: CPU 1, CPU 5
  core 1: CPU 3, CPU 7

/proc/cpuinfo says:

processor  : 0
vendor     : GenuineIntel
arch       : IA-64
family     : 32
model      : 1
model name : Dual-Core Intel(R) Itanium(R) 2 Processor 9150M
revision   : 1
archrev    : 0
features   : branchlong, 16-byte atomic ops
cpu number : 0
cpu regs   : 4
cpu MHz    : 1668.672
itc MHz    : 416.667500
BogoMIPS   : 1662.97
siblings   : 4
physical id: 0
core id    : 0
thread id  : 0

processor  : 1
vendor     : GenuineIntel
arch       : IA-64
family     : 32
model      : 1
model name : Dual-Core Intel(R) Itanium(R) 2 Processor 9150M
revision   : 1
archrev    : 0
features   : branchlong, 16-byte atomic ops
cpu number : 0
cpu regs   : 4
cpu MHz    : 1668.672
itc MHz    : 416.667500
BogoMIPS   : 1662.97
siblings   : 4
physical id: 196611
core id    : 0
thread id  : 0

processor  : 2
vendor     : GenuineIntel
arch       : IA-64
family     : 32
model      : 1
model name : Dual-Core Intel(R) Itanium(R) 2 Processor 9150M
revision   : 1
archrev    : 0
features   : branchlong, 16-byte atomic ops
cpu number : 0
cpu regs   : 4
cpu MHz    : 1668.672
itc MHz    : 416.667500
BogoMIPS   : 1662.97
siblings   : 4
physical id: 0
core id    : 1
thread id  : 0

processor  : 3
vendor     : GenuineIntel
arch       : IA-64
family     : 32
model      : 1
model name : Dual-Core Intel(R) Itanium(R) 2 Processor 9150M
revision   : 1
archrev    : 0
features   : branchlong, 16-byte atomic ops
cpu number : 0
cpu regs   : 4
cpu MHz    : 1668.672
itc MHz    : 416.667500
BogoMIPS   : 1662.97
siblings   : 4
physical id: 196611
core id    : 1
thread id  : 0

processor  : 4
vendor     : GenuineIntel
arch       : IA-64
family     : 32
model      : 1
model name : Dual-Core Intel(R) Itanium(R) 2 Processor 9150M
revision   : 1
archrev    : 0
features   : branchlong, 16-byte atomic ops
cpu number : 0
cpu regs   : 4
cpu MHz    : 1668.672
itc MHz    : 416.667500
BogoMIPS   : 1662.97
siblings   : 4
physical id: 0
core id    : 0
thread id  : 1

processor  : 5
vendor     : GenuineIntel
arch       : IA-64
family     : 32
model      : 1
model name : Dual-Core Intel(R) Itanium(R) 2 Processor 9150M
revision   : 1
archrev    : 0
features   : branchlong, 16-byte atomic ops
cpu number : 0
cpu regs   : 4
cpu MHz    : 1668.672
itc MHz    : 416.667500
BogoMIPS   : 1662.97
siblings   : 4
physical id: 196611
core id    : 0
thread id  : 1

processor  : 6
vendor     : GenuineIntel
arch       : IA-64
family     : 32
model      : 1
model name : Dual-Core Intel(R) Itanium(R) 2 Processor 9150M
revision   : 1
archrev    : 0
features   : branchlong, 16-byte atomic ops
cpu number : 0
cpu regs   : 4
cpu MHz    : 1668.672
itc MHz    : 416.667500
BogoMIPS   : 1662.97
siblings   : 4
physical id: 0
core id    : 1
thread id  : 1

processor  : 7
vendor     : GenuineIntel
arch       : IA-64
family     : 32
model      : 1
model name : Dual-Core Intel(R) Itanium(R) 2 Processor 9150M
revision   : 1
archrev    : 0
features   : branchlong, 16-byte atomic ops
cpu number : 0
cpu regs   : 4
cpu MHz    : 1668.672
itc MHz    : 416.667500
BogoMIPS   : 1662.97
siblings   : 4
physical id: 196611
core id    : 1
thread id  : 1

Petr Tesarik

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
@ 2010-08-27 17:16     ` Petr Tesarik
  0 siblings, 0 replies; 72+ messages in thread
From: Petr Tesarik @ 2010-08-27 17:16 UTC (permalink / raw)
  To: Luck, Tony; +Cc: linux-ia64, linux-kernel, Hedi Berriche

On Friday 27 of August 2010 18:08:03 Luck, Tony wrote:
> > Hedi Berriche sent me a simple test case that can
> > trigger the failure on the siglock.
>
> Can you post the test case please. How long does it typically take
> to reproduce the problem?

I let Hedi send it. It's really easy to reproduce. In fact, I can reproduce it 
within 5 minutes on an 8-CPU system.

> > Next, CPU 5 releases the spinlock with st2.rel, changing the lock
> > value to 0x0 (correct).
> >
> > SO FAR SO GOOD.
> >
> > Now, CPU 4, CPU 5 and CPU 7 all want to acquire the lock again.
> > Interestingly, CPU 5 and CPU 7 are both granted the same ticket,
>
> What is the duplicate ticket number that CPUs 5 & 7 get at this point?
> Presumably 0x0, yes? Or do they see a stale 0x7fff?

They get a zero, yes.
> > and the spinlock value (as seen from the debug fault handler) is
> > 0x0 after single-stepping over the fetchadd4.acq, in both cases.
> > CPU 4 correctly sets the spinlock value to 0x1.
>
> Is the fault handler using "ld.acq" to look at the spinlock value?
> If not, then this might be a red herring. [Though clearly something
> bad is going on here].

Right. I also realized I was reading the spinlock value with a plain "ld4". 
When I changed it to "ld4.acq", this is what happens:

1. We're in _spin_lock_irq, which starts like this:

0xa0000001008ea000 <_spin_lock_irq>:    [MMI]       rsm 0x4000;;
0xa0000001008ea001 <_spin_lock_irq+1>:              fetchadd4.acq r15=[r32],1
0xa0000001008ea002 <_spin_lock_irq+2>:              nop.i 0x0;;

AFAICS the spinlock value should be 0x0 (after having wrapped around from 
0xffff0000 at release on the same CPU).

2. fetchadd4.acq generates a debug exception (because it writes to the watched 
location)
3. ld4.acq inside the debug fault handler reads 0x0 from the location
4. the handler sets PSR.ss on return
5. fetchadd4.acq puts 0x1 (why?) in r15 and generates a Single step fault
6. the fault handler now reads 0x0 (sic!) from the spinlock location (again, 
using ld4.acq)
7. the resulting kernel crash dump contains ZERO in the spinlock location

Maybe, there's something wrong with my test module, because I'm already 
getting tired today, but there's definitely something wrong here. I'll try to 
polish it and send here later.

> > Any ideas?
>
> What cpu model are you running on?
> What is the topological connection between CPU 4, 5 and 7 - are any of
> them hyper-threaded siblings? Cores on same socket? N.B. topology may
> change from boot to boot, so you may need to capture /proc/cpuinfo from
> the same boot where this problem is detected. But the variation is
> usually limited to which socket gets to own logical cpu 0.

There are two Dual-Core Intel(R) Itanium(R) 2 Processor 9150M in the test 
machine:

physical package 0
  core 0: CPU 0, CPU 4
  core 1: CPU 2, CPU 6

physical package 196611
  core 0: CPU 1, CPU 5
  core 1: CPU 3, CPU 7

/proc/cpuinfo says:

processor  : 0
vendor     : GenuineIntel
arch       : IA-64
family     : 32
model      : 1
model name : Dual-Core Intel(R) Itanium(R) 2 Processor 9150M
revision   : 1
archrev    : 0
features   : branchlong, 16-byte atomic ops
cpu number : 0
cpu regs   : 4
cpu MHz    : 1668.672
itc MHz    : 416.667500
BogoMIPS   : 1662.97
siblings   : 4
physical id: 0
core id    : 0
thread id  : 0

processor  : 1
vendor     : GenuineIntel
arch       : IA-64
family     : 32
model      : 1
model name : Dual-Core Intel(R) Itanium(R) 2 Processor 9150M
revision   : 1
archrev    : 0
features   : branchlong, 16-byte atomic ops
cpu number : 0
cpu regs   : 4
cpu MHz    : 1668.672
itc MHz    : 416.667500
BogoMIPS   : 1662.97
siblings   : 4
physical id: 196611
core id    : 0
thread id  : 0

processor  : 2
vendor     : GenuineIntel
arch       : IA-64
family     : 32
model      : 1
model name : Dual-Core Intel(R) Itanium(R) 2 Processor 9150M
revision   : 1
archrev    : 0
features   : branchlong, 16-byte atomic ops
cpu number : 0
cpu regs   : 4
cpu MHz    : 1668.672
itc MHz    : 416.667500
BogoMIPS   : 1662.97
siblings   : 4
physical id: 0
core id    : 1
thread id  : 0

processor  : 3
vendor     : GenuineIntel
arch       : IA-64
family     : 32
model      : 1
model name : Dual-Core Intel(R) Itanium(R) 2 Processor 9150M
revision   : 1
archrev    : 0
features   : branchlong, 16-byte atomic ops
cpu number : 0
cpu regs   : 4
cpu MHz    : 1668.672
itc MHz    : 416.667500
BogoMIPS   : 1662.97
siblings   : 4
physical id: 196611
core id    : 1
thread id  : 0

processor  : 4
vendor     : GenuineIntel
arch       : IA-64
family     : 32
model      : 1
model name : Dual-Core Intel(R) Itanium(R) 2 Processor 9150M
revision   : 1
archrev    : 0
features   : branchlong, 16-byte atomic ops
cpu number : 0
cpu regs   : 4
cpu MHz    : 1668.672
itc MHz    : 416.667500
BogoMIPS   : 1662.97
siblings   : 4
physical id: 0
core id    : 0
thread id  : 1

processor  : 5
vendor     : GenuineIntel
arch       : IA-64
family     : 32
model      : 1
model name : Dual-Core Intel(R) Itanium(R) 2 Processor 9150M
revision   : 1
archrev    : 0
features   : branchlong, 16-byte atomic ops
cpu number : 0
cpu regs   : 4
cpu MHz    : 1668.672
itc MHz    : 416.667500
BogoMIPS   : 1662.97
siblings   : 4
physical id: 196611
core id    : 0
thread id  : 1

processor  : 6
vendor     : GenuineIntel
arch       : IA-64
family     : 32
model      : 1
model name : Dual-Core Intel(R) Itanium(R) 2 Processor 9150M
revision   : 1
archrev    : 0
features   : branchlong, 16-byte atomic ops
cpu number : 0
cpu regs   : 4
cpu MHz    : 1668.672
itc MHz    : 416.667500
BogoMIPS   : 1662.97
siblings   : 4
physical id: 0
core id    : 1
thread id  : 1

processor  : 7
vendor     : GenuineIntel
arch       : IA-64
family     : 32
model      : 1
model name : Dual-Core Intel(R) Itanium(R) 2 Processor 9150M
revision   : 1
archrev    : 0
features   : branchlong, 16-byte atomic ops
cpu number : 0
cpu regs   : 4
cpu MHz    : 1668.672
itc MHz    : 416.667500
BogoMIPS   : 1662.97
siblings   : 4
physical id: 196611
core id    : 1
thread id  : 1

Petr Tesarik

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
  2010-08-27 17:16     ` Petr Tesarik
@ 2010-08-27 18:20       ` Hedi Berriche
  -1 siblings, 0 replies; 72+ messages in thread
From: Hedi Berriche @ 2010-08-27 18:20 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: Luck, Tony, linux-ia64, linux-kernel

On Fri, Aug 27, 2010 at 18:16 Petr Tesarik wrote:
| On Friday 27 of August 2010 18:08:03 Luck, Tony wrote:
| > > Hedi Berriche sent me a simple test case that can
| > > trigger the failure on the siglock.
| >
| > Can you post the test case please. How long does it typically take
| > to reproduce the problem?
| 
| I let Hedi send it. It's really easy to reproduce. In fact, I can reproduce it 
| within 5 minutes on an 8-CPU system.

Test case provided via private email.

Cheers,
Hedi.
-- 
Hedi Berriche
Global Product Support
http://www.sgi.com/support

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
@ 2010-08-27 18:20       ` Hedi Berriche
  0 siblings, 0 replies; 72+ messages in thread
From: Hedi Berriche @ 2010-08-27 18:20 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: Luck, Tony, linux-ia64, linux-kernel

On Fri, Aug 27, 2010 at 18:16 Petr Tesarik wrote:
| On Friday 27 of August 2010 18:08:03 Luck, Tony wrote:
| > > Hedi Berriche sent me a simple test case that can
| > > trigger the failure on the siglock.
| >
| > Can you post the test case please. How long does it typically take
| > to reproduce the problem?
| 
| I let Hedi send it. It's really easy to reproduce. In fact, I can reproduce it 
| within 5 minutes on an 8-CPU system.

Test case provided via private email.

Cheers,
Hedi.
-- 
Hedi Berriche
Global Product Support
http://www.sgi.com/support

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
  2010-08-27 17:16     ` Petr Tesarik
@ 2010-08-27 19:40       ` Petr Tesarik
  -1 siblings, 0 replies; 72+ messages in thread
From: Petr Tesarik @ 2010-08-27 19:40 UTC (permalink / raw)
  To: Luck, Tony; +Cc: linux-ia64, linux-kernel, Hedi Berriche

On Friday 27 of August 2010 19:16:29 Petr Tesarik wrote:
> On Friday 27 of August 2010 18:08:03 Luck, Tony wrote:
> > > Hedi Berriche sent me a simple test case that can
> > > trigger the failure on the siglock.
> >
> > Can you post the test case please. How long does it typically take
> > to reproduce the problem?
>
> I let Hedi send it. It's really easy to reproduce. In fact, I can reproduce
> it within 5 minutes on an 8-CPU system.
>
> > > Next, CPU 5 releases the spinlock with st2.rel, changing the lock
> > > value to 0x0 (correct).
> > >
> > > SO FAR SO GOOD.
> > >
> > > Now, CPU 4, CPU 5 and CPU 7 all want to acquire the lock again.
> > > Interestingly, CPU 5 and CPU 7 are both granted the same ticket,
> >
> > What is the duplicate ticket number that CPUs 5 & 7 get at this point?
> > Presumably 0x0, yes? Or do they see a stale 0x7fff?
>
> They get a zero, yes.
>
> > > and the spinlock value (as seen from the debug fault handler) is
> > > 0x0 after single-stepping over the fetchadd4.acq, in both cases.
> > > CPU 4 correctly sets the spinlock value to 0x1.
> >
> > Is the fault handler using "ld.acq" to look at the spinlock value?
> > If not, then this might be a red herring. [Though clearly something
> > bad is going on here].
>
> Right. I also realized I was reading the spinlock value with a plain "ld4".
> When I changed it to "ld4.acq", this is what happens:
>
> 1. We're in _spin_lock_irq, which starts like this:
>
> 0xa0000001008ea000 <_spin_lock_irq>:    [MMI]       rsm 0x4000;;
> 0xa0000001008ea001 <_spin_lock_irq+1>:              fetchadd4.acq
> r15=[r32],1 0xa0000001008ea002 <_spin_lock_irq+2>:              nop.i 0x0;;
>
> AFAICS the spinlock value should be 0x0 (after having wrapped around from
> 0xffff0000 at release on the same CPU).
>
> 2. fetchadd4.acq generates a debug exception (because it writes to the
> watched location)
> 3. ld4.acq inside the debug fault handler reads 0x0 from the location
> 4. the handler sets PSR.ss on return
> 5. fetchadd4.acq puts 0x1 (why?) in r15 and generates a Single step fault
> 6. the fault handler now reads 0x0 (sic!) from the spinlock location
> (again, using ld4.acq)
> 7. the resulting kernel crash dump contains ZERO in the spinlock location

I have another crash dump which recorded the same values in the debug fault 
handler, but the resulting crash dump contains 0x1 (not 0x0) in the spinlock. 
R15 was still 0x1 (even though it should contain the original value, not the 
incremented one, shouldn't it?).

Petr Tesarik

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
@ 2010-08-27 19:40       ` Petr Tesarik
  0 siblings, 0 replies; 72+ messages in thread
From: Petr Tesarik @ 2010-08-27 19:40 UTC (permalink / raw)
  To: Luck, Tony; +Cc: linux-ia64, linux-kernel, Hedi Berriche

On Friday 27 of August 2010 19:16:29 Petr Tesarik wrote:
> On Friday 27 of August 2010 18:08:03 Luck, Tony wrote:
> > > Hedi Berriche sent me a simple test case that can
> > > trigger the failure on the siglock.
> >
> > Can you post the test case please. How long does it typically take
> > to reproduce the problem?
>
> I let Hedi send it. It's really easy to reproduce. In fact, I can reproduce
> it within 5 minutes on an 8-CPU system.
>
> > > Next, CPU 5 releases the spinlock with st2.rel, changing the lock
> > > value to 0x0 (correct).
> > >
> > > SO FAR SO GOOD.
> > >
> > > Now, CPU 4, CPU 5 and CPU 7 all want to acquire the lock again.
> > > Interestingly, CPU 5 and CPU 7 are both granted the same ticket,
> >
> > What is the duplicate ticket number that CPUs 5 & 7 get at this point?
> > Presumably 0x0, yes? Or do they see a stale 0x7fff?
>
> They get a zero, yes.
>
> > > and the spinlock value (as seen from the debug fault handler) is
> > > 0x0 after single-stepping over the fetchadd4.acq, in both cases.
> > > CPU 4 correctly sets the spinlock value to 0x1.
> >
> > Is the fault handler using "ld.acq" to look at the spinlock value?
> > If not, then this might be a red herring. [Though clearly something
> > bad is going on here].
>
> Right. I also realized I was reading the spinlock value with a plain "ld4".
> When I changed it to "ld4.acq", this is what happens:
>
> 1. We're in _spin_lock_irq, which starts like this:
>
> 0xa0000001008ea000 <_spin_lock_irq>:    [MMI]       rsm 0x4000;;
> 0xa0000001008ea001 <_spin_lock_irq+1>:              fetchadd4.acq
> r15=[r32],1 0xa0000001008ea002 <_spin_lock_irq+2>:              nop.i 0x0;;
>
> AFAICS the spinlock value should be 0x0 (after having wrapped around from
> 0xffff0000 at release on the same CPU).
>
> 2. fetchadd4.acq generates a debug exception (because it writes to the
> watched location)
> 3. ld4.acq inside the debug fault handler reads 0x0 from the location
> 4. the handler sets PSR.ss on return
> 5. fetchadd4.acq puts 0x1 (why?) in r15 and generates a Single step fault
> 6. the fault handler now reads 0x0 (sic!) from the spinlock location
> (again, using ld4.acq)
> 7. the resulting kernel crash dump contains ZERO in the spinlock location

I have another crash dump which recorded the same values in the debug fault 
handler, but the resulting crash dump contains 0x1 (not 0x0) in the spinlock. 
R15 was still 0x1 (even though it should contain the original value, not the 
incremented one, shouldn't it?).

Petr Tesarik

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: Serious problem with ticket spinlocks on ia64
  2010-08-27 16:08   ` Luck, Tony
@ 2010-08-27 20:29     ` Luck, Tony
  -1 siblings, 0 replies; 72+ messages in thread
From: Luck, Tony @ 2010-08-27 20:29 UTC (permalink / raw)
  To: Petr Tesarik, linux-ia64; +Cc: linux-kernel, Hedi Berriche

> If this is a memory ordering problem (and that seems quite plausible)
> then a liberal sprinkling of "ia64_mf()" calls throughout the spinlock
> routines would probably make it go away.

I think I take this back ... if it were a memory ordering problem, then
it could show up any time - not just at wrap-around.

-Tony

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: Serious problem with ticket spinlocks on ia64
@ 2010-08-27 20:29     ` Luck, Tony
  0 siblings, 0 replies; 72+ messages in thread
From: Luck, Tony @ 2010-08-27 20:29 UTC (permalink / raw)
  To: Petr Tesarik, linux-ia64; +Cc: linux-kernel, Hedi Berriche

> If this is a memory ordering problem (and that seems quite plausible)
> then a liberal sprinkling of "ia64_mf()" calls throughout the spinlock
> routines would probably make it go away.

I think I take this back ... if it were a memory ordering problem, then
it could show up any time - not just at wrap-around.

-Tony

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
  2010-08-27 20:29     ` Luck, Tony
@ 2010-08-27 20:41       ` Petr Tesarik
  -1 siblings, 0 replies; 72+ messages in thread
From: Petr Tesarik @ 2010-08-27 20:41 UTC (permalink / raw)
  To: Luck, Tony; +Cc: linux-ia64, linux-kernel, Hedi Berriche

On Friday 27 of August 2010 22:29:41 Luck, Tony wrote:
> > If this is a memory ordering problem (and that seems quite plausible)
> > then a liberal sprinkling of "ia64_mf()" calls throughout the spinlock
> > routines would probably make it go away.
>
> I think I take this back ... if it were a memory ordering problem, then
> it could show up any time - not just at wrap-around.

Well, I wasn't originally sure if it only happens at wrap-around. OTOH I've 
now modified my tests, so that they would also catch any other badness, and I 
still only got another two failures after wrap-around.

>From looking at the traces, I'm afraid this smells like another Itanium 
erratum. I'm now trying to write a minimal test case...

Petr Tesarik

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
@ 2010-08-27 20:41       ` Petr Tesarik
  0 siblings, 0 replies; 72+ messages in thread
From: Petr Tesarik @ 2010-08-27 20:41 UTC (permalink / raw)
  To: Luck, Tony; +Cc: linux-ia64, linux-kernel, Hedi Berriche

On Friday 27 of August 2010 22:29:41 Luck, Tony wrote:
> > If this is a memory ordering problem (and that seems quite plausible)
> > then a liberal sprinkling of "ia64_mf()" calls throughout the spinlock
> > routines would probably make it go away.
>
> I think I take this back ... if it were a memory ordering problem, then
> it could show up any time - not just at wrap-around.

Well, I wasn't originally sure if it only happens at wrap-around. OTOH I've 
now modified my tests, so that they would also catch any other badness, and I 
still only got another two failures after wrap-around.

From looking at the traces, I'm afraid this smells like another Itanium 
erratum. I'm now trying to write a minimal test case...

Petr Tesarik

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
  2010-08-27 20:29     ` Luck, Tony
@ 2010-08-27 21:03       ` Petr Tesarik
  -1 siblings, 0 replies; 72+ messages in thread
From: Petr Tesarik @ 2010-08-27 21:03 UTC (permalink / raw)
  To: Luck, Tony; +Cc: linux-ia64, linux-kernel, Hedi Berriche

On Friday 27 of August 2010 22:29:41 Luck, Tony wrote:
> > If this is a memory ordering problem (and that seems quite plausible)
> > then a liberal sprinkling of "ia64_mf()" calls throughout the spinlock
> > routines would probably make it go away.
>
> I think I take this back ... if it were a memory ordering problem, then
> it could show up any time - not just at wrap-around.

One more idea. The wrap-around case is the only one when the high word is 
modified. This is in fact the only case when the fetchadd.acq competes with 
the st2.rel about the actual contents of that location. I don't know if it 
matters...

Petr Tesarik

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
@ 2010-08-27 21:03       ` Petr Tesarik
  0 siblings, 0 replies; 72+ messages in thread
From: Petr Tesarik @ 2010-08-27 21:03 UTC (permalink / raw)
  To: Luck, Tony; +Cc: linux-ia64, linux-kernel, Hedi Berriche

On Friday 27 of August 2010 22:29:41 Luck, Tony wrote:
> > If this is a memory ordering problem (and that seems quite plausible)
> > then a liberal sprinkling of "ia64_mf()" calls throughout the spinlock
> > routines would probably make it go away.
>
> I think I take this back ... if it were a memory ordering problem, then
> it could show up any time - not just at wrap-around.

One more idea. The wrap-around case is the only one when the high word is 
modified. This is in fact the only case when the fetchadd.acq competes with 
the st2.rel about the actual contents of that location. I don't know if it 
matters...

Petr Tesarik

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: Serious problem with ticket spinlocks on ia64
  2010-08-27 21:03       ` Petr Tesarik
@ 2010-08-27 21:11         ` Luck, Tony
  -1 siblings, 0 replies; 72+ messages in thread
From: Luck, Tony @ 2010-08-27 21:11 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: linux-ia64, linux-kernel, Hedi Berriche

> One more idea. The wrap-around case is the only one when the high word is 
> modified. This is in fact the only case when the fetchadd.acq competes with 
> the st2.rel about the actual contents of that location. I don't know if it 
> matters...

I pondered that for a while - but I have difficulty believing that
fetchadd looks at which bits changed and only writes back the bytes
that did.

-Tony

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: Serious problem with ticket spinlocks on ia64
@ 2010-08-27 21:11         ` Luck, Tony
  0 siblings, 0 replies; 72+ messages in thread
From: Luck, Tony @ 2010-08-27 21:11 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: linux-ia64, linux-kernel, Hedi Berriche

> One more idea. The wrap-around case is the only one when the high word is 
> modified. This is in fact the only case when the fetchadd.acq competes with 
> the st2.rel about the actual contents of that location. I don't know if it 
> matters...

I pondered that for a while - but I have difficulty believing that
fetchadd looks at which bits changed and only writes back the bytes
that did.

-Tony

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
  2010-08-27 21:11         ` Luck, Tony
@ 2010-08-27 22:13           ` Petr Tesarik
  -1 siblings, 0 replies; 72+ messages in thread
From: Petr Tesarik @ 2010-08-27 22:13 UTC (permalink / raw)
  To: Luck, Tony; +Cc: linux-ia64, linux-kernel, Hedi Berriche

On Friday 27 of August 2010 23:11:55 Luck, Tony wrote:
> > One more idea. The wrap-around case is the only one when the high word is
> > modified. This is in fact the only case when the fetchadd.acq competes
> > with the st2.rel about the actual contents of that location. I don't know
> > if it matters...
>
> I pondered that for a while - but I have difficulty believing that
> fetchadd looks at which bits changed and only writes back the bytes
> that did.

OTOH the counter is only 15-bit, so it also wraps around at 0xfffe7fff, but
I have never seen it fail there. It always fails after the wrap-around from 
0xfffeffff.

Petr Tesarik

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
@ 2010-08-27 22:13           ` Petr Tesarik
  0 siblings, 0 replies; 72+ messages in thread
From: Petr Tesarik @ 2010-08-27 22:13 UTC (permalink / raw)
  To: Luck, Tony; +Cc: linux-ia64, linux-kernel, Hedi Berriche

On Friday 27 of August 2010 23:11:55 Luck, Tony wrote:
> > One more idea. The wrap-around case is the only one when the high word is
> > modified. This is in fact the only case when the fetchadd.acq competes
> > with the st2.rel about the actual contents of that location. I don't know
> > if it matters...
>
> I pondered that for a while - but I have difficulty believing that
> fetchadd looks at which bits changed and only writes back the bytes
> that did.

OTOH the counter is only 15-bit, so it also wraps around at 0xfffe7fff, but
I have never seen it fail there. It always fails after the wrap-around from 
0xfffeffff.

Petr Tesarik

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: Serious problem with ticket spinlocks on ia64
  2010-08-27 22:13           ` Petr Tesarik
@ 2010-08-27 23:26             ` Luck, Tony
  -1 siblings, 0 replies; 72+ messages in thread
From: Luck, Tony @ 2010-08-27 23:26 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: linux-ia64, linux-kernel, Hedi Berriche

[-- Attachment #1: Type: text/plain, Size: 484 bytes --]

Debugging this in the kernel seemed hard ... so I tried to construct a
user level test using the same code from the kernel. See attached. But
this fails so catastrophically with any number of threads greater than
one, that I suspect I made some mistake cut & pasting the relevant bits
of kernel infrastructure.  The goal of the program is to have several
child processes pounding on a lock while the parent looks in every 5
seconds to see if they are making progress.

-Tony

[-- Attachment #2: spinwrap.c --]
[-- Type: text/plain, Size: 7081 bytes --]

typedef struct {
        volatile unsigned int lock;
} arch_spinlock_t;

#define WORKERS 8

struct sh {
	arch_spinlock_t	l; char pad1[60];
	long success; char pad2[56];
	long worker[WORKERS];
	int locks[WORKERS];
} *s;

int me;

/* cut & paste infrastructure from kernel start here */
typedef unsigned long __u64;
#define __always_inline inline __attribute__((always_inline))
#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))

extern unsigned long __bad_size_for_ia64_fetch_and_add (void);
extern unsigned long __bad_increment_for_ia64_fetch_and_add (void);

#define ia64_invala() asm volatile ("invala" ::: "memory")

#define ia64_hint_pause 0

#define ia64_hint(mode)                                         \
({                                                              \
        switch (mode) {                                         \
        case ia64_hint_pause:                                   \
                asm volatile ("hint @pause" ::: "memory");      \
                break;                                          \
        }                                                       \
})
#define cpu_relax() ia64_hint(ia64_hint_pause)

#define ia64_fetchadd4_acq(p, inc)                                              \
({                                                                              \
                                                                                \
        __u64 ia64_intri_res;                                                   \
        asm volatile ("fetchadd4.acq %0=[%1],%2"                                \
                                : "=r"(ia64_intri_res) : "r"(p), "i" (inc)      \
                                : "memory");                                    \
                                                                                \
        ia64_intri_res;                                                         \
})

#define IA64_FETCHADD(tmp,v,n,sz,sem)                                           \
({                                                                              \
        switch (sz) {                                                           \
              case 4:                                                           \
                tmp = ia64_fetchadd4_##sem((unsigned int *) v, n);              \
                break;                                                          \
                                                                                \
              case 8:                                                           \
                tmp = ia64_fetchadd8_##sem((unsigned long *) v, n);             \
                break;                                                          \
                                                                                \
              default:                                                          \
                __bad_size_for_ia64_fetch_and_add();                            \
        }                                                                       \
})

#define ia64_fetchadd(i,v,sem)                                                          \
({                                                                                      \
        __u64 _tmp;                                                                     \
        volatile __typeof__(*(v)) *_v = (v);                                            \
        /* Can't use a switch () here: gcc isn't always smart enough for that... */     \
        if ((i) == -16)                                                                 \
                IA64_FETCHADD(_tmp, _v, -16, sizeof(*(v)), sem);                        \
        else if ((i) == -8)                                                             \
                IA64_FETCHADD(_tmp, _v, -8, sizeof(*(v)), sem);                         \
        else if ((i) == -4)                                                             \
                IA64_FETCHADD(_tmp, _v, -4, sizeof(*(v)), sem);                         \
        else if ((i) == -1)                                                             \
                IA64_FETCHADD(_tmp, _v, -1, sizeof(*(v)), sem);                         \
        else if ((i) == 1)                                                              \
                IA64_FETCHADD(_tmp, _v, 1, sizeof(*(v)), sem);                          \
        else if ((i) == 4)                                                              \
                IA64_FETCHADD(_tmp, _v, 4, sizeof(*(v)), sem);                          \
        else if ((i) == 8)                                                              \
                IA64_FETCHADD(_tmp, _v, 8, sizeof(*(v)), sem);                          \
        else if ((i) == 16)                                                             \
                IA64_FETCHADD(_tmp, _v, 16, sizeof(*(v)), sem);                         \
        else                                                                            \
                _tmp = __bad_increment_for_ia64_fetch_and_add();                        \
        (__typeof__(*(v))) (_tmp);      /* return old value */                          \
})

#define TICKET_SHIFT    17
#define TICKET_BITS     15
#define TICKET_MASK     ((1 << TICKET_BITS) - 1)

static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
{
        int     *p = (int *)&lock->lock, ticket, serve;

        ticket = ia64_fetchadd(1, p, acq);

        if (!(((ticket >> TICKET_SHIFT) ^ ticket) & TICKET_MASK))
                return;

        ia64_invala();

s->locks[me] = ticket;
        for (;;) {
                asm volatile ("ld4.c.nc %0=[%1]" : "=r"(serve) : "r"(p) : "memory");

                if (!(((serve >> TICKET_SHIFT) ^ ticket) & TICKET_MASK)) {
s->locks[me] = 0;
                        return;
		}
                cpu_relax();
        }
}

static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
{
        unsigned short  *p = (unsigned short *)&lock->lock + 1, tmp;

        asm volatile ("ld2.bias %0=[%1]" : "=r"(tmp) : "r"(p));
        ACCESS_ONCE(*p) = (tmp + 2) & ~1;
}
/* cut & paste infrastructure from kernel ends here */

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/mman.h>

work()
{
printf("Starting worker %d\n", me);
	while (1) {
		__ticket_spin_lock(&s->l);
		s->success++;
		s->worker[me]++;
		__ticket_spin_unlock(&s->l);
	}
}

main(int argc, char **argv)
{
	int i, pid;
	int workers = WORKERS;

	if (argc > 1) {
		workers = atoi(argv[1]);
		if (workers < 1 || workers > WORKERS)
			workers = WORKERS;
	}
	s = mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0L);

	printf("shared mapping at %p\n", s);

	for (i = 0; i < workers; i++) switch (pid = fork()) {
	case -1: perror("fork"); return 1;
	case 0: me = i; work(); return 0;
	}

	while (1) {
		sleep(5);
		printf("%ld [lock = %.8x]\n", s->success, s->l.lock);
		for (i = 0; i < workers; i++)
			printf(" %ld %.8x\n", s->worker[i], s->locks[i]);
	}
}

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: Serious problem with ticket spinlocks on ia64
@ 2010-08-27 23:26             ` Luck, Tony
  0 siblings, 0 replies; 72+ messages in thread
From: Luck, Tony @ 2010-08-27 23:26 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: linux-ia64, linux-kernel, Hedi Berriche

[-- Attachment #1: Type: text/plain, Size: 484 bytes --]

Debugging this in the kernel seemed hard ... so I tried to construct a
user level test using the same code from the kernel. See attached. But
this fails so catastrophically with any number of threads greater than
one, that I suspect I made some mistake cut & pasting the relevant bits
of kernel infrastructure.  The goal of the program is to have several
child processes pounding on a lock while the parent looks in every 5
seconds to see if they are making progress.

-Tony

[-- Attachment #2: spinwrap.c --]
[-- Type: text/plain, Size: 7081 bytes --]

typedef struct {
        volatile unsigned int lock;
} arch_spinlock_t;

#define WORKERS 8

struct sh {
	arch_spinlock_t	l; char pad1[60];
	long success; char pad2[56];
	long worker[WORKERS];
	int locks[WORKERS];
} *s;

int me;

/* cut & paste infrastructure from kernel start here */
typedef unsigned long __u64;
#define __always_inline inline __attribute__((always_inline))
#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))

extern unsigned long __bad_size_for_ia64_fetch_and_add (void);
extern unsigned long __bad_increment_for_ia64_fetch_and_add (void);

#define ia64_invala() asm volatile ("invala" ::: "memory")

#define ia64_hint_pause 0

#define ia64_hint(mode)                                         \
({                                                              \
        switch (mode) {                                         \
        case ia64_hint_pause:                                   \
                asm volatile ("hint @pause" ::: "memory");      \
                break;                                          \
        }                                                       \
})
#define cpu_relax() ia64_hint(ia64_hint_pause)

#define ia64_fetchadd4_acq(p, inc)                                              \
({                                                                              \
                                                                                \
        __u64 ia64_intri_res;                                                   \
        asm volatile ("fetchadd4.acq %0=[%1],%2"                                \
                                : "=r"(ia64_intri_res) : "r"(p), "i" (inc)      \
                                : "memory");                                    \
                                                                                \
        ia64_intri_res;                                                         \
})

#define IA64_FETCHADD(tmp,v,n,sz,sem)                                           \
({                                                                              \
        switch (sz) {                                                           \
              case 4:                                                           \
                tmp = ia64_fetchadd4_##sem((unsigned int *) v, n);              \
                break;                                                          \
                                                                                \
              case 8:                                                           \
                tmp = ia64_fetchadd8_##sem((unsigned long *) v, n);             \
                break;                                                          \
                                                                                \
              default:                                                          \
                __bad_size_for_ia64_fetch_and_add();                            \
        }                                                                       \
})

#define ia64_fetchadd(i,v,sem)                                                          \
({                                                                                      \
        __u64 _tmp;                                                                     \
        volatile __typeof__(*(v)) *_v = (v);                                            \
        /* Can't use a switch () here: gcc isn't always smart enough for that... */     \
        if ((i) == -16)                                                                 \
                IA64_FETCHADD(_tmp, _v, -16, sizeof(*(v)), sem);                        \
        else if ((i) == -8)                                                             \
                IA64_FETCHADD(_tmp, _v, -8, sizeof(*(v)), sem);                         \
        else if ((i) == -4)                                                             \
                IA64_FETCHADD(_tmp, _v, -4, sizeof(*(v)), sem);                         \
        else if ((i) == -1)                                                             \
                IA64_FETCHADD(_tmp, _v, -1, sizeof(*(v)), sem);                         \
        else if ((i) == 1)                                                              \
                IA64_FETCHADD(_tmp, _v, 1, sizeof(*(v)), sem);                          \
        else if ((i) == 4)                                                              \
                IA64_FETCHADD(_tmp, _v, 4, sizeof(*(v)), sem);                          \
        else if ((i) == 8)                                                              \
                IA64_FETCHADD(_tmp, _v, 8, sizeof(*(v)), sem);                          \
        else if ((i) == 16)                                                             \
                IA64_FETCHADD(_tmp, _v, 16, sizeof(*(v)), sem);                         \
        else                                                                            \
                _tmp = __bad_increment_for_ia64_fetch_and_add();                        \
        (__typeof__(*(v))) (_tmp);      /* return old value */                          \
})

#define TICKET_SHIFT    17
#define TICKET_BITS     15
#define TICKET_MASK     ((1 << TICKET_BITS) - 1)

static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
{
        int     *p = (int *)&lock->lock, ticket, serve;

        ticket = ia64_fetchadd(1, p, acq);

        if (!(((ticket >> TICKET_SHIFT) ^ ticket) & TICKET_MASK))
                return;

        ia64_invala();

s->locks[me] = ticket;
        for (;;) {
                asm volatile ("ld4.c.nc %0=[%1]" : "=r"(serve) : "r"(p) : "memory");

                if (!(((serve >> TICKET_SHIFT) ^ ticket) & TICKET_MASK)) {
s->locks[me] = 0;
                        return;
		}
                cpu_relax();
        }
}

static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
{
        unsigned short  *p = (unsigned short *)&lock->lock + 1, tmp;

        asm volatile ("ld2.bias %0=[%1]" : "=r"(tmp) : "r"(p));
        ACCESS_ONCE(*p) = (tmp + 2) & ~1;
}
/* cut & paste infrastructure from kernel ends here */

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/mman.h>

work()
{
printf("Starting worker %d\n", me);
	while (1) {
		__ticket_spin_lock(&s->l);
		s->success++;
		s->worker[me]++;
		__ticket_spin_unlock(&s->l);
	}
}

main(int argc, char **argv)
{
	int i, pid;
	int workers = WORKERS;

	if (argc > 1) {
		workers = atoi(argv[1]);
		if (workers < 1 || workers > WORKERS)
			workers = WORKERS;
	}
	s = mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0L);

	printf("shared mapping at %p\n", s);

	for (i = 0; i < workers; i++) switch (pid = fork()) {
	case -1: perror("fork"); return 1;
	case 0: me = i; work(); return 0;
	}

	while (1) {
		sleep(5);
		printf("%ld [lock = %.8x]\n", s->success, s->l.lock);
		for (i = 0; i < workers; i++)
			printf(" %ld %.8x\n", s->worker[i], s->locks[i]);
	}
}

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: Serious problem with ticket spinlocks on ia64
  2010-08-27 23:26             ` Luck, Tony
@ 2010-08-27 23:55               ` Luck, Tony
  -1 siblings, 0 replies; 72+ messages in thread
From: Luck, Tony @ 2010-08-27 23:55 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: linux-ia64, linux-kernel, Hedi Berriche

> this fails so catastrophically

Interesting ... getting rid of some of the fancy asm bits
(replacing both the ld4.c.nc and the ld2.bias with simple
"serve = *p" and "tmp = *p") makes this run a lot better.

-Tony


^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: Serious problem with ticket spinlocks on ia64
@ 2010-08-27 23:55               ` Luck, Tony
  0 siblings, 0 replies; 72+ messages in thread
From: Luck, Tony @ 2010-08-27 23:55 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: linux-ia64, linux-kernel, Hedi Berriche

> this fails so catastrophically

Interesting ... getting rid of some of the fancy asm bits
(replacing both the ld4.c.nc and the ld2.bias with simple
"serve = *p" and "tmp = *p") makes this run a lot better.

-Tony


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
  2010-08-27 23:55               ` Luck, Tony
@ 2010-08-28  0:28                 ` Hedi Berriche
  -1 siblings, 0 replies; 72+ messages in thread
From: Hedi Berriche @ 2010-08-28  0:28 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Petr Tesarik, linux-ia64, linux-kernel

On Sat, Aug 28, 2010 at 00:55 Tony Luck wrote:
| > this fails so catastrophically
| 
| Interesting ... getting rid of some of the fancy asm bits
| (replacing both the ld4.c.nc and the ld2.bias with simple
| "serve = *p" and "tmp = *p") makes this run a lot better.

*Hasty* look seems to suggest that keeping the fancy asm bits but compiling with
-O2 -frename-registers makes it work equally well.

Don't take my word for it though, double check it, it's been a long week and
brain is anything but alert at this time of the night.

Cheers,
Hedi.
-- 
Hedi Berriche
Global Product Support
http://www.sgi.com/support

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
@ 2010-08-28  0:28                 ` Hedi Berriche
  0 siblings, 0 replies; 72+ messages in thread
From: Hedi Berriche @ 2010-08-28  0:28 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Petr Tesarik, linux-ia64, linux-kernel

On Sat, Aug 28, 2010 at 00:55 Tony Luck wrote:
| > this fails so catastrophically
| 
| Interesting ... getting rid of some of the fancy asm bits
| (replacing both the ld4.c.nc and the ld2.bias with simple
| "serve = *p" and "tmp = *p") makes this run a lot better.

*Hasty* look seems to suggest that keeping the fancy asm bits but compiling with
-O2 -frename-registers makes it work equally well.

Don't take my word for it though, double check it, it's been a long week and
brain is anything but alert at this time of the night.

Cheers,
Hedi.
-- 
Hedi Berriche
Global Product Support
http://www.sgi.com/support

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: Serious problem with ticket spinlocks on ia64
  2010-08-28  0:28                 ` Hedi Berriche
@ 2010-08-28  5:01                   ` Luck, Tony
  -1 siblings, 0 replies; 72+ messages in thread
From: Luck, Tony @ 2010-08-28  5:01 UTC (permalink / raw)
  To: Hedi Berriche; +Cc: Petr Tesarik, linux-ia64, linux-kernel

>*Hasty* look seems to suggest that keeping the fancy asm bits but compiling >with
>-O2 -frename-registers makes it work equally well.
>
>Don't take my word for it though, double check it, it's been a long week and
>brain is anything but alert at this time of the night.

Yup - that makes the usermode version run just fine (no problems
in the first billion lock/unlock operations ... which is
quite a lot of wrap-arounds of the 15-bit ticket lock).

And since we already use -frename-registers when building the
kernel, no immediate help for the kernel problem. :-(

I may tinker with this test a bit to include some short random
amounts of hold-time for the lock, and delays between attempts
to acquire it (to make it look more like a contended kernel lock
and less like a continuous queue of processes trading around a
lock that is never free ... Petr's debug information definitely
showed the lock becoming free at the wraparound (lock == 0x0).

-Tony

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: Serious problem with ticket spinlocks on ia64
@ 2010-08-28  5:01                   ` Luck, Tony
  0 siblings, 0 replies; 72+ messages in thread
From: Luck, Tony @ 2010-08-28  5:01 UTC (permalink / raw)
  To: Hedi Berriche; +Cc: Petr Tesarik, linux-ia64, linux-kernel

>*Hasty* look seems to suggest that keeping the fancy asm bits but compiling >with
>-O2 -frename-registers makes it work equally well.
>
>Don't take my word for it though, double check it, it's been a long week and
>brain is anything but alert at this time of the night.

Yup - that makes the usermode version run just fine (no problems
in the first billion lock/unlock operations ... which is
quite a lot of wrap-arounds of the 15-bit ticket lock).

And since we already use -frename-registers when building the
kernel, no immediate help for the kernel problem. :-(

I may tinker with this test a bit to include some short random
amounts of hold-time for the lock, and delays between attempts
to acquire it (to make it look more like a contended kernel lock
and less like a continuous queue of processes trading around a
lock that is never free ... Petr's debug information definitely
showed the lock becoming free at the wraparound (lock = 0x0).

-Tony

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: Serious problem with ticket spinlocks on ia64
  2010-08-28  5:01                   ` Luck, Tony
@ 2010-08-30 18:17                     ` Luck, Tony
  -1 siblings, 0 replies; 72+ messages in thread
From: Luck, Tony @ 2010-08-30 18:17 UTC (permalink / raw)
  To: Hedi Berriche; +Cc: Petr Tesarik, linux-ia64, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1356 bytes --]

> I may tinker with this test a bit to include some short random
> amounts of hold-time for the lock, and delays between attempts
> to acquire it (to make it look more like a contended kernel lock
> and less like a continuous queue of processes trading around a
> lock that is never free

I've been iterating ... adding new bits to try to reproduce the
kernel environment:

1) Added delays (bigger delay not holding the lock than holding
   it - so contention is controlled)
2) Check that locking actually works (with a critzone element that
is only modified when the lock is held).
3) Sometimes use trylock (all the odd numbered threads do this).

Compile with -frename-registers ... and add a nop() { } function
in another file (just to make sure the compiler doesn't optimize
the delay loops).

Sadly ... my user mode experiments haven't yet yielded any cases
where the ticket locks fail in the way that Petr saw them mess up
inside the kernel.  This latest version has been running for ~90
minutes and has completed 25 million lock/trylock iterations (with
about a third of the ticket lock wraparounds passing through the
uncontested case (lock == 0) and the rest happening with some
processes waiting for the lock.

So now I'm trying to think of other ways that the kernel case
differs from my user mode mock-up.

-Tony

[-- Attachment #2: spinwrap.c --]
[-- Type: text/plain, Size: 12024 bytes --]

typedef struct {
        volatile unsigned int lock;
} arch_spinlock_t;

#define WORKERS 16
int workers = WORKERS;

struct sh {
	arch_spinlock_t	l; char pad1[60];
	long success; char pad2[56];
	unsigned long critzone; char pad3[56];
	unsigned long running; char pad4[56];
	long worker[WORKERS];
	int wraps[WORKERS];
} *s;

int me;

/* cut & paste infrastructure from kernel start here */
typedef unsigned long __u64;
typedef unsigned int __u32;
typedef unsigned short __u16;
typedef unsigned char __u8;
#define __always_inline inline __attribute__((always_inline))
#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))

extern unsigned long __bad_size_for_ia64_fetch_and_add (void);
extern unsigned long __bad_increment_for_ia64_fetch_and_add (void);

#define ia64_invala() asm volatile ("invala" ::: "memory")

#define ia64_hint_pause 0

#define ia64_hint(mode)                                         \
({                                                              \
        switch (mode) {                                         \
        case ia64_hint_pause:                                   \
                asm volatile ("hint @pause" ::: "memory");      \
                break;                                          \
        }                                                       \
})
#define cpu_relax() ia64_hint(ia64_hint_pause)

#define ia64_fetchadd4_acq(p, inc)                                              \
({                                                                              \
                                                                                \
        __u64 ia64_intri_res;                                                   \
        asm volatile ("fetchadd4.acq %0=[%1],%2"                                \
                                : "=r"(ia64_intri_res) : "r"(p), "i" (inc)      \
                                : "memory");                                    \
                                                                                \
        ia64_intri_res;                                                         \
})

#define IA64_FETCHADD(tmp,v,n,sz,sem)                                           \
({                                                                              \
        switch (sz) {                                                           \
              case 4:                                                           \
                tmp = ia64_fetchadd4_##sem((unsigned int *) v, n);              \
                break;                                                          \
                                                                                \
              case 8:                                                           \
                tmp = ia64_fetchadd8_##sem((unsigned long *) v, n);             \
                break;                                                          \
                                                                                \
              default:                                                          \
                __bad_size_for_ia64_fetch_and_add();                            \
        }                                                                       \
})

#define ia64_fetchadd(i,v,sem)                                                          \
({                                                                                      \
        __u64 _tmp;                                                                     \
        volatile __typeof__(*(v)) *_v = (v);                                            \
        /* Can't use a switch () here: gcc isn't always smart enough for that... */     \
        if ((i) == -16)                                                                 \
                IA64_FETCHADD(_tmp, _v, -16, sizeof(*(v)), sem);                        \
        else if ((i) == -8)                                                             \
                IA64_FETCHADD(_tmp, _v, -8, sizeof(*(v)), sem);                         \
        else if ((i) == -4)                                                             \
                IA64_FETCHADD(_tmp, _v, -4, sizeof(*(v)), sem);                         \
        else if ((i) == -1)                                                             \
                IA64_FETCHADD(_tmp, _v, -1, sizeof(*(v)), sem);                         \
        else if ((i) == 1)                                                              \
                IA64_FETCHADD(_tmp, _v, 1, sizeof(*(v)), sem);                          \
        else if ((i) == 4)                                                              \
                IA64_FETCHADD(_tmp, _v, 4, sizeof(*(v)), sem);                          \
        else if ((i) == 8)                                                              \
                IA64_FETCHADD(_tmp, _v, 8, sizeof(*(v)), sem);                          \
        else if ((i) == 16)                                                             \
                IA64_FETCHADD(_tmp, _v, 16, sizeof(*(v)), sem);                         \
        else                                                                            \
                _tmp = __bad_increment_for_ia64_fetch_and_add();                        \
        (__typeof__(*(v))) (_tmp);      /* return old value */                          \
})

#define ia64_cmpxchg4_acq(ptr, new, old)                                                \
({                                                                                      \
        __u64 ia64_intri_res;                                                           \
        asm volatile ("mov ar.ccv=%0;;" :: "rO"(old));                                  \
        asm volatile ("cmpxchg4.acq %0=[%1],%2,ar.ccv":                                 \
                              "=r"(ia64_intri_res) : "r"(ptr), "r"(new) : "memory");    \
        ia64_intri_res;                                                                 \
})

/*
 * This function doesn't exist, so you'll get a linker error
 * if something tries to do an invalid cmpxchg().
 */
extern long ia64_cmpxchg_called_with_bad_pointer (void);

#define ia64_cmpxchg(sem,ptr,old,new,size)                                              \
({                                                                                      \
        __u64 _o_, _r_;                                                                 \
                                                                                        \
        switch (size) {                                                                 \
              case 1: _o_ = (__u8 ) (long) (old); break;                                \
              case 2: _o_ = (__u16) (long) (old); break;                                \
              case 4: _o_ = (__u32) (long) (old); break;                                \
              case 8: _o_ = (__u64) (long) (old); break;                                \
              default: break;                                                           \
        }                                                                               \
        switch (size) {                                                                 \
              case 1:                                                                   \
                _r_ = ia64_cmpxchg1_##sem((__u8 *) ptr, new, _o_);                      \
                break;                                                                  \
                                                                                        \
              case 2:                                                                   \
               _r_ = ia64_cmpxchg2_##sem((__u16 *) ptr, new, _o_);                      \
                break;                                                                  \
                                                                                        \
              case 4:                                                                   \
                _r_ = ia64_cmpxchg4_##sem((__u32 *) ptr, new, _o_);                     \
                break;                                                                  \
                                                                                        \
              case 8:                                                                   \
                _r_ = ia64_cmpxchg8_##sem((__u64 *) ptr, new, _o_);                     \
                break;                                                                  \
                                                                                        \
              default:                                                                  \
                _r_ = ia64_cmpxchg_called_with_bad_pointer();                           \
                break;                                                                  \
        }                                                                               \
        (__typeof__(old)) _r_;                                                          \
})

#define TICKET_SHIFT    17 /*was17*/
#define TICKET_BITS     15 /*was15*/
#define TICKET_MASK     ((1 << TICKET_BITS) - 1)

static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
{
        int     *p = (int *)&lock->lock, ticket, serve;

        ticket = ia64_fetchadd(1, p, acq);

if (ticket == 0) s->wraps[me]++;

        if (!(((ticket >> TICKET_SHIFT) ^ ticket) & TICKET_MASK))
                return;

        ia64_invala();

        for (;;) {
                asm volatile ("ld4.c.nc %0=[%1]" : "=r"(serve) : "r"(p) : "memory");

                if (!(((serve >> TICKET_SHIFT) ^ ticket) & TICKET_MASK)) {
                        return;
		}
                cpu_relax();
        }
}

static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
{
        int tmp = ACCESS_ONCE(lock->lock);

if (tmp == 0) s->wraps[me]++;
        if (!(((tmp >> TICKET_SHIFT) ^ tmp) & TICKET_MASK))
                return ia64_cmpxchg(acq, &lock->lock, tmp, tmp + 1, sizeof (tmp)) == tmp;
        return 0;
}

static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
{
        unsigned short  *p = (unsigned short *)&lock->lock + 1, tmp;

        asm volatile ("ld2.bias %0=[%1]" : "=r"(tmp) : "r"(p));
        ACCESS_ONCE(*p) = (tmp + 2) & ~1;
}
/* cut & paste infrastructure from kernel ends here */

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/mman.h>

extern void nop(void);

void
delay(int scale)
{
	int	r = random();

	r &= 0xffff;

	r *= scale;

	while (r-- > 0)
		nop();
}

work()
{
	unsigned long crit, mymask = (1ul << me);
	int gotlock;

	srandom(getpid());
	while (s->running) {
		delay(workers);
		if (me & 1) {
			gotlock = __ticket_spin_trylock(&s->l);
		} else {
			__ticket_spin_lock(&s->l);
			gotlock = 1;
		}
		if (gotlock) {
			if (crit = s->critzone) {
				s->running = 0;
				printf("Worker %d saw %lx at entry\n", me, crit);
				break;
			}
			s->critzone |= mymask;
			s->success++;
			s->worker[me]++;

			delay(1);

			if ((crit = s->critzone) != mymask) {
				printf("Worker %d saw %lx at exit\n", me, crit);
				s->running = 0;
				break;
			}
			s->critzone = 0;
			__ticket_spin_unlock(&s->l);
		}
	}
}

main(int argc, char **argv)
{
	int i, pid;

	if (argc > 1) {
		workers = atoi(argv[1]);
		if (workers < 1 || workers > WORKERS)
			workers = WORKERS;
	}
	s = mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0L);

	printf("shared mapping at %p\n", s);
	s->running = 1;

	for (i = 0; i < workers; i++) switch (pid = fork()) {
	case -1: perror("fork"); s->running = 0; return 1;
	case 0: me = i; work(); return 0;
	}

	while (s->running) {
		sleep(5);
		printf("%ld [lock = %.8x]\n", s->success, s->l.lock);
		for (i = 0; i < workers; i++)
			printf(" %ld %.8x\n", s->worker[i], s->wraps[i]);
	}
}

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: Serious problem with ticket spinlocks on ia64
@ 2010-08-30 18:17                     ` Luck, Tony
  0 siblings, 0 replies; 72+ messages in thread
From: Luck, Tony @ 2010-08-30 18:17 UTC (permalink / raw)
  To: Hedi Berriche; +Cc: Petr Tesarik, linux-ia64, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1356 bytes --]

> I may tinker with this test a bit to include some short random
> amounts of hold-time for the lock, and delays between attempts
> to acquire it (to make it look more like a contended kernel lock
> and less like a continuous queue of processes trading around a
> lock that is never free

I've been iterating ... adding new bits to try to reproduce the
kernel environment:

1) Added delays (bigger delay not holding the lock than holding
   it - so contention is controlled)
2) Check that locking actually works (with a critzone element that
is only modified when the lock is held).
3) Sometimes use trylock (all the odd numbered threads do this).

Compile with -frename-registers ... and add a nop() { } function
in another file (just to make sure the compiler doesn't optimize
the delay loops).

Sadly ... my user mode experiments haven't yet yielded any cases
where the ticket locks fail in the way that Petr saw them mess up
inside the kernel.  This latest version has been running for ~90
minutes and has completed 25 million lock/trylock iterations (with
about a third of the ticket lock wraparounds passing through the
uncontested case (lock == 0) and the rest happening with some
processes waiting for the lock.

So now I'm trying to think of other ways that the kernel case
differs from my user mode mock-up.

-Tony

[-- Attachment #2: spinwrap.c --]
[-- Type: text/plain, Size: 12024 bytes --]

typedef struct {
        volatile unsigned int lock;
} arch_spinlock_t;

#define WORKERS 16
int workers = WORKERS;

struct sh {
	arch_spinlock_t	l; char pad1[60];
	long success; char pad2[56];
	unsigned long critzone; char pad3[56];
	unsigned long running; char pad4[56];
	long worker[WORKERS];
	int wraps[WORKERS];
} *s;

int me;

/* cut & paste infrastructure from kernel start here */
typedef unsigned long __u64;
typedef unsigned int __u32;
typedef unsigned short __u16;
typedef unsigned char __u8;
#define __always_inline inline __attribute__((always_inline))
#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))

extern unsigned long __bad_size_for_ia64_fetch_and_add (void);
extern unsigned long __bad_increment_for_ia64_fetch_and_add (void);

#define ia64_invala() asm volatile ("invala" ::: "memory")

#define ia64_hint_pause 0

#define ia64_hint(mode)                                         \
({                                                              \
        switch (mode) {                                         \
        case ia64_hint_pause:                                   \
                asm volatile ("hint @pause" ::: "memory");      \
                break;                                          \
        }                                                       \
})
#define cpu_relax() ia64_hint(ia64_hint_pause)

#define ia64_fetchadd4_acq(p, inc)                                              \
({                                                                              \
                                                                                \
        __u64 ia64_intri_res;                                                   \
        asm volatile ("fetchadd4.acq %0=[%1],%2"                                \
                                : "=r"(ia64_intri_res) : "r"(p), "i" (inc)      \
                                : "memory");                                    \
                                                                                \
        ia64_intri_res;                                                         \
})

#define IA64_FETCHADD(tmp,v,n,sz,sem)                                           \
({                                                                              \
        switch (sz) {                                                           \
              case 4:                                                           \
                tmp = ia64_fetchadd4_##sem((unsigned int *) v, n);              \
                break;                                                          \
                                                                                \
              case 8:                                                           \
                tmp = ia64_fetchadd8_##sem((unsigned long *) v, n);             \
                break;                                                          \
                                                                                \
              default:                                                          \
                __bad_size_for_ia64_fetch_and_add();                            \
        }                                                                       \
})

#define ia64_fetchadd(i,v,sem)                                                          \
({                                                                                      \
        __u64 _tmp;                                                                     \
        volatile __typeof__(*(v)) *_v = (v);                                            \
        /* Can't use a switch () here: gcc isn't always smart enough for that... */     \
        if ((i) == -16)                                                                 \
                IA64_FETCHADD(_tmp, _v, -16, sizeof(*(v)), sem);                        \
        else if ((i) == -8)                                                             \
                IA64_FETCHADD(_tmp, _v, -8, sizeof(*(v)), sem);                         \
        else if ((i) == -4)                                                             \
                IA64_FETCHADD(_tmp, _v, -4, sizeof(*(v)), sem);                         \
        else if ((i) == -1)                                                             \
                IA64_FETCHADD(_tmp, _v, -1, sizeof(*(v)), sem);                         \
        else if ((i) == 1)                                                              \
                IA64_FETCHADD(_tmp, _v, 1, sizeof(*(v)), sem);                          \
        else if ((i) == 4)                                                              \
                IA64_FETCHADD(_tmp, _v, 4, sizeof(*(v)), sem);                          \
        else if ((i) == 8)                                                              \
                IA64_FETCHADD(_tmp, _v, 8, sizeof(*(v)), sem);                          \
        else if ((i) == 16)                                                             \
                IA64_FETCHADD(_tmp, _v, 16, sizeof(*(v)), sem);                         \
        else                                                                            \
                _tmp = __bad_increment_for_ia64_fetch_and_add();                        \
        (__typeof__(*(v))) (_tmp);      /* return old value */                          \
})

#define ia64_cmpxchg4_acq(ptr, new, old)                                                \
({                                                                                      \
        __u64 ia64_intri_res;                                                           \
        asm volatile ("mov ar.ccv=%0;;" :: "rO"(old));                                  \
        asm volatile ("cmpxchg4.acq %0=[%1],%2,ar.ccv":                                 \
                              "=r"(ia64_intri_res) : "r"(ptr), "r"(new) : "memory");    \
        ia64_intri_res;                                                                 \
})

/*
 * This function doesn't exist, so you'll get a linker error
 * if something tries to do an invalid cmpxchg().
 */
extern long ia64_cmpxchg_called_with_bad_pointer (void);

#define ia64_cmpxchg(sem,ptr,old,new,size)                                              \
({                                                                                      \
        __u64 _o_, _r_;                                                                 \
                                                                                        \
        switch (size) {                                                                 \
              case 1: _o_ = (__u8 ) (long) (old); break;                                \
              case 2: _o_ = (__u16) (long) (old); break;                                \
              case 4: _o_ = (__u32) (long) (old); break;                                \
              case 8: _o_ = (__u64) (long) (old); break;                                \
              default: break;                                                           \
        }                                                                               \
        switch (size) {                                                                 \
              case 1:                                                                   \
                _r_ = ia64_cmpxchg1_##sem((__u8 *) ptr, new, _o_);                      \
                break;                                                                  \
                                                                                        \
              case 2:                                                                   \
               _r_ = ia64_cmpxchg2_##sem((__u16 *) ptr, new, _o_);                      \
                break;                                                                  \
                                                                                        \
              case 4:                                                                   \
                _r_ = ia64_cmpxchg4_##sem((__u32 *) ptr, new, _o_);                     \
                break;                                                                  \
                                                                                        \
              case 8:                                                                   \
                _r_ = ia64_cmpxchg8_##sem((__u64 *) ptr, new, _o_);                     \
                break;                                                                  \
                                                                                        \
              default:                                                                  \
                _r_ = ia64_cmpxchg_called_with_bad_pointer();                           \
                break;                                                                  \
        }                                                                               \
        (__typeof__(old)) _r_;                                                          \
})

#define TICKET_SHIFT    17 /*was17*/
#define TICKET_BITS     15 /*was15*/
#define TICKET_MASK     ((1 << TICKET_BITS) - 1)

static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
{
        int     *p = (int *)&lock->lock, ticket, serve;

        ticket = ia64_fetchadd(1, p, acq);

if (ticket == 0) s->wraps[me]++;

        if (!(((ticket >> TICKET_SHIFT) ^ ticket) & TICKET_MASK))
                return;

        ia64_invala();

        for (;;) {
                asm volatile ("ld4.c.nc %0=[%1]" : "=r"(serve) : "r"(p) : "memory");

                if (!(((serve >> TICKET_SHIFT) ^ ticket) & TICKET_MASK)) {
                        return;
		}
                cpu_relax();
        }
}

static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
{
        int tmp = ACCESS_ONCE(lock->lock);

if (tmp == 0) s->wraps[me]++;
        if (!(((tmp >> TICKET_SHIFT) ^ tmp) & TICKET_MASK))
                return ia64_cmpxchg(acq, &lock->lock, tmp, tmp + 1, sizeof (tmp)) == tmp;
        return 0;
}

static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
{
        unsigned short  *p = (unsigned short *)&lock->lock + 1, tmp;

        asm volatile ("ld2.bias %0=[%1]" : "=r"(tmp) : "r"(p));
        ACCESS_ONCE(*p) = (tmp + 2) & ~1;
}
/* cut & paste infrastructure from kernel ends here */

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/mman.h>

extern void nop(void);

void
delay(int scale)
{
	int	r = random();

	r &= 0xffff;

	r *= scale;

	while (r-- > 0)
		nop();
}

work()
{
	unsigned long crit, mymask = (1ul << me);
	int gotlock;

	srandom(getpid());
	while (s->running) {
		delay(workers);
		if (me & 1) {
			gotlock = __ticket_spin_trylock(&s->l);
		} else {
			__ticket_spin_lock(&s->l);
			gotlock = 1;
		}
		if (gotlock) {
			if (crit = s->critzone) {
				s->running = 0;
				printf("Worker %d saw %lx at entry\n", me, crit);
				break;
			}
			s->critzone |= mymask;
			s->success++;
			s->worker[me]++;

			delay(1);

			if ((crit = s->critzone) != mymask) {
				printf("Worker %d saw %lx at exit\n", me, crit);
				s->running = 0;
				break;
			}
			s->critzone = 0;
			__ticket_spin_unlock(&s->l);
		}
	}
}

main(int argc, char **argv)
{
	int i, pid;

	if (argc > 1) {
		workers = atoi(argv[1]);
		if (workers < 1 || workers > WORKERS)
			workers = WORKERS;
	}
	s = mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0L);

	printf("shared mapping at %p\n", s);
	s->running = 1;

	for (i = 0; i < workers; i++) switch (pid = fork()) {
	case -1: perror("fork"); s->running = 0; return 1;
	case 0: me = i; work(); return 0;
	}

	while (s->running) {
		sleep(5);
		printf("%ld [lock = %.8x]\n", s->success, s->l.lock);
		for (i = 0; i < workers; i++)
			printf(" %ld %.8x\n", s->worker[i], s->wraps[i]);
	}
}

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
  2010-08-30 18:17                     ` Luck, Tony
@ 2010-08-30 21:41                       ` Petr Tesarik
  -1 siblings, 0 replies; 72+ messages in thread
From: Petr Tesarik @ 2010-08-30 21:41 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Hedi Berriche, linux-ia64, linux-kernel

On Monday 30 of August 2010 20:17:25 Luck, Tony wrote:
> > I may tinker with this test a bit to include some short random
> > amounts of hold-time for the lock, and delays between attempts
> > to acquire it (to make it look more like a contended kernel lock
> > and less like a continuous queue of processes trading around a
> > lock that is never free
>
> I've been iterating ... adding new bits to try to reproduce the
> kernel environment:

Hi Tony,

I've been also playing with my test case, and I haven't been able to reproduce 
it in user-space either. One thing I noticed was the apparently incorrect use 
of ALAT. The generated code for _spin_lock_irq contains:

invala;;
ld4.c.nc r11=[r32]
// Other instructions not affecting r20
ld4.c.nc r20=[r32]

IIUC, the subsequent compare can use an undefined value (r20 is not modified 
anywhere in this function, except by the ld4.c.nc, but that happens only on 
an ALAT miss, right?).

I changed the corresponding code in __ticket_spin_lock to:

   asm volatile ("ld4.c.nc %0=[%1]" : "+r"(serve) : "r"(p) : "memory");

(NB the "+r" constraint instead of "=r")

The generated code now re-uses r15. Unfortunately, Hedi's test case still 
fails for me. :(

Petr Tesarik

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
@ 2010-08-30 21:41                       ` Petr Tesarik
  0 siblings, 0 replies; 72+ messages in thread
From: Petr Tesarik @ 2010-08-30 21:41 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Hedi Berriche, linux-ia64, linux-kernel

On Monday 30 of August 2010 20:17:25 Luck, Tony wrote:
> > I may tinker with this test a bit to include some short random
> > amounts of hold-time for the lock, and delays between attempts
> > to acquire it (to make it look more like a contended kernel lock
> > and less like a continuous queue of processes trading around a
> > lock that is never free
>
> I've been iterating ... adding new bits to try to reproduce the
> kernel environment:

Hi Tony,

I've been also playing with my test case, and I haven't been able to reproduce 
it in user-space either. One thing I noticed was the apparently incorrect use 
of ALAT. The generated code for _spin_lock_irq contains:

invala;;
ld4.c.nc r11=[r32]
// Other instructions not affecting r20
ld4.c.nc r20=[r32]

IIUC, the subsequent compare can use an undefined value (r20 is not modified 
anywhere in this function, except by the ld4.c.nc, but that happens only on 
an ALAT miss, right?).

I changed the corresponding code in __ticket_spin_lock to:

   asm volatile ("ld4.c.nc %0=[%1]" : "+r"(serve) : "r"(p) : "memory");

(NB the "+r" constraint instead of "=r")

The generated code now re-uses r15. Unfortunately, Hedi's test case still 
fails for me. :(

Petr Tesarik

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
  2010-08-30 21:41                       ` Petr Tesarik
@ 2010-08-30 22:43                         ` Tony Luck
  -1 siblings, 0 replies; 72+ messages in thread
From: Tony Luck @ 2010-08-30 22:43 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: Hedi Berriche, linux-ia64, linux-kernel

On Mon, Aug 30, 2010 at 2:41 PM, Petr Tesarik <ptesarik@suse.cz> wrote:
> IIUC, the subsequent compare can use an undefined value (r20 is not modified
> anywhere in this function, except by the ld4.c.nc, but that happens only on
> an ALAT miss, right?).

I don't see that in my kernel - but you raise a good point. The idea in this
code is that invala makes sure that we don't have a matching ALAT entry
on the first iteration of the loop, so we will do the load. Then we loop not
actually doing the access until the ALAT tells us that the location may
have changed, when we load again.  But this assumes that the compiler
didn't decide to re-use the register for something else in the loop. So
things may be a bit fragile if some aggressive optimization happen after
the routine is inlined.

Does Hedi's test fail for you if you drop the fancy ALAT trick?
I.e. just use "serve = *p;" in __ticket_spin_lock()? [& similar in
__ticket_spin_unlock_wait() if you want - but I don't think that
is ever used for siglock]

-Tony

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
@ 2010-08-30 22:43                         ` Tony Luck
  0 siblings, 0 replies; 72+ messages in thread
From: Tony Luck @ 2010-08-30 22:43 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: Hedi Berriche, linux-ia64, linux-kernel

On Mon, Aug 30, 2010 at 2:41 PM, Petr Tesarik <ptesarik@suse.cz> wrote:
> IIUC, the subsequent compare can use an undefined value (r20 is not modified
> anywhere in this function, except by the ld4.c.nc, but that happens only on
> an ALAT miss, right?).

I don't see that in my kernel - but you raise a good point. The idea in this
code is that invala makes sure that we don't have a matching ALAT entry
on the first iteration of the loop, so we will do the load. Then we loop not
actually doing the access until the ALAT tells us that the location may
have changed, when we load again.  But this assumes that the compiler
didn't decide to re-use the register for something else in the loop. So
things may be a bit fragile if some aggressive optimization happen after
the routine is inlined.

Does Hedi's test fail for you if you drop the fancy ALAT trick?
I.e. just use "serve = *p;" in __ticket_spin_lock()? [& similar in
__ticket_spin_unlock_wait() if you want - but I don't think that
is ever used for siglock]

-Tony

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
  2010-08-30 22:43                         ` Tony Luck
@ 2010-08-31 22:17                           ` Tony Luck
  -1 siblings, 0 replies; 72+ messages in thread
From: Tony Luck @ 2010-08-31 22:17 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: Hedi Berriche, linux-ia64, linux-kernel

On Mon, Aug 30, 2010 at 3:43 PM, Tony Luck <tony.luck@intel.com> wrote:
> Does Hedi's test fail for you if you drop the fancy ALAT trick?
> I.e. just use "serve = *p;" in __ticket_spin_lock()?

Answering my own question ... it failed in 47 minutes with the breakit
script on iteration 2812 :-(

So it would appear that the problem isn't related to the ALAT, or weird
compiler optimizations around the inline asm.

-Tony

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
@ 2010-08-31 22:17                           ` Tony Luck
  0 siblings, 0 replies; 72+ messages in thread
From: Tony Luck @ 2010-08-31 22:17 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: Hedi Berriche, linux-ia64, linux-kernel

On Mon, Aug 30, 2010 at 3:43 PM, Tony Luck <tony.luck@intel.com> wrote:
> Does Hedi's test fail for you if you drop the fancy ALAT trick?
> I.e. just use "serve = *p;" in __ticket_spin_lock()?

Answering my own question ... it failed in 47 minutes with the breakit
script on iteration 2812 :-(

So it would appear that the problem isn't related to the ALAT, or weird
compiler optimizations around the inline asm.

-Tony

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
  2010-08-31 22:17                           ` Tony Luck
@ 2010-09-01 23:09                             ` Tony Luck
  -1 siblings, 0 replies; 72+ messages in thread
From: Tony Luck @ 2010-09-01 23:09 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: Hedi Berriche, linux-ia64, linux-kernel

More results from other experiments ...

1) It occurred to me that I should check that these test cases weren't
hitting some other problem in 2.6.36-rc3. So I ported the 64-bit
version of ticket locks to the current kernel and ran the stress test.
It was still going strong at 16 hours (where all my other experiments
tend to fail at 90 minutes or less).

2) Next I investigated whether wrap-around was related by reducing
TICKET_BITS from 15 to 8 (I only have 32 cpus, so this should be
plenty). I also moved the bit offset of the "now serving" value to
different spots in the high half of the lock to check whether we were
hitting some issues with overflow from the fetchadd on the low half
into the high half, or some sign problem when bit 31 was set.  These
tests all failed in 20 minutes to an hour (not significantly different
from TICKET_BITS=15) ... so wraparound appears not to be an issue.

3) Then I wondered whether it was a problem that we used fetchadd4
which modifies all 32 bits in an atomic instruction when acquiring the
lock, but a simple st2 to write just the upper 16 bits when doing the
unlock. So I recoded __ticket_spin_unlock() to spin on a cmpxchg call
to update all 32-bits with an atomic instruction. This one failed in
34 minutes.

4) Memory ordering? I added ia64_mf() calls liberally throughout all
the __ticket_* routines. Kernel failed in 32 minutes.

Summary: the only change that helps is the 64-bit ticket locks.

-Tony

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
@ 2010-09-01 23:09                             ` Tony Luck
  0 siblings, 0 replies; 72+ messages in thread
From: Tony Luck @ 2010-09-01 23:09 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: Hedi Berriche, linux-ia64, linux-kernel

More results from other experiments ...

1) It occurred to me that I should check that these test cases weren't
hitting some other problem in 2.6.36-rc3. So I ported the 64-bit
version of ticket locks to the current kernel and ran the stress test.
It was still going strong at 16 hours (where all my other experiments
tend to fail at 90 minutes or less).

2) Next I investigated whether wrap-around was related by reducing
TICKET_BITS from 15 to 8 (I only have 32 cpus, so this should be
plenty). I also moved the bit offset of the "now serving" value to
different spots in the high half of the lock to check whether we were
hitting some issues with overflow from the fetchadd on the low half
into the high half, or some sign problem when bit 31 was set.  These
tests all failed in 20 minutes to an hour (not significantly different
from TICKET_BITS\x15) ... so wraparound appears not to be an issue.

3) Then I wondered whether it was a problem that we used fetchadd4
which modifies all 32 bits in an atomic instruction when acquiring the
lock, but a simple st2 to write just the upper 16 bits when doing the
unlock. So I recoded __ticket_spin_unlock() to spin on a cmpxchg call
to update all 32-bits with an atomic instruction. This one failed in
34 minutes.

4) Memory ordering? I added ia64_mf() calls liberally throughout all
the __ticket_* routines. Kernel failed in 32 minutes.

Summary: the only change that helps is the 64-bit ticket locks.

-Tony

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
  2010-09-01 23:09                             ` Tony Luck
@ 2010-09-02  0:26                               ` Hedi Berriche
  -1 siblings, 0 replies; 72+ messages in thread
From: Hedi Berriche @ 2010-09-02  0:26 UTC (permalink / raw)
  To: Tony Luck; +Cc: Petr Tesarik, linux-ia64, linux-kernel

On Thu, Sep 02, 2010 at 00:10 Tony Luck wrote:
| More results from other experiments ...
| 
| 1) It occurred to me that I should check that these test cases weren't
| hitting some other problem in 2.6.36-rc3. So I ported the 64-bit
| version of ticket locks to the current kernel and ran the stress test.
| It was still going strong at 16 hours (where all my other experiments
| tend to fail at 90 minutes or less).

This is consistent with the bisection that led me to the 4 bytes ticket
locks commit.

| Summary: the only change that helps is the 64-bit ticket locks.

Ditto.

Cheers,
Hedi.
-- 
Hedi Berriche
Global Product Support
http://www.sgi.com/support

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
@ 2010-09-02  0:26                               ` Hedi Berriche
  0 siblings, 0 replies; 72+ messages in thread
From: Hedi Berriche @ 2010-09-02  0:26 UTC (permalink / raw)
  To: Tony Luck; +Cc: Petr Tesarik, linux-ia64, linux-kernel

On Thu, Sep 02, 2010 at 00:10 Tony Luck wrote:
| More results from other experiments ...
| 
| 1) It occurred to me that I should check that these test cases weren't
| hitting some other problem in 2.6.36-rc3. So I ported the 64-bit
| version of ticket locks to the current kernel and ran the stress test.
| It was still going strong at 16 hours (where all my other experiments
| tend to fail at 90 minutes or less).

This is consistent with the bisection that led me to the 4 bytes ticket
locks commit.

| Summary: the only change that helps is the 64-bit ticket locks.

Ditto.

Cheers,
Hedi.
-- 
Hedi Berriche
Global Product Support
http://www.sgi.com/support

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
  2010-09-02  0:26                               ` Hedi Berriche
@ 2010-09-03  0:06                                 ` Tony Luck
  -1 siblings, 0 replies; 72+ messages in thread
From: Tony Luck @ 2010-09-03  0:06 UTC (permalink / raw)
  To: Tony Luck, Petr Tesarik, linux-ia64, linux-kernel

Today's experiments were inspired by Petr's comment at the start of this thread:

   "Interestingly, CPU 5 and CPU 7 are both granted the same ticket"

I added an "owner" element to every lock - I have 32 cpus, so I made
it "unsigned int". Then added to the lock and trylock paths code to
check that owner was 0 when the lock was granted, followed by:
lock->owner |= (1u << cpu);  Then in the unlock path I check that just
the (1u << cpu) bit is set before doing: lock->owner &= ~(1u << cpu);

In my first test I got a hit. cpu28 had failed to get the lock and was
spinning holding ticket "1". When "now serving" hit 1, cpu28 saw that
the owner field was set to 0x1, indicating that cpu0 had also claimed
the lock. The lockword was 0x20002 at this point ... so cpu28 was
correct to believe that the lock had been freed and handed to it.  It
was unclear why cpu0 had muscled in and set its bit in the owner
field. Also can't tell whether that was a newly allocated lock, or one
that had recently wrapped around.

Subsequent tests have failed to reproduce that result - system just
hangs without complaining about multiple cpus owning the same lock at
the same time - perhaps because of the extra tracing I included to
capture more details.

-Tony

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
@ 2010-09-03  0:06                                 ` Tony Luck
  0 siblings, 0 replies; 72+ messages in thread
From: Tony Luck @ 2010-09-03  0:06 UTC (permalink / raw)
  To: Tony Luck, Petr Tesarik, linux-ia64, linux-kernel

Today's experiments were inspired by Petr's comment at the start of this thread:

   "Interestingly, CPU 5 and CPU 7 are both granted the same ticket"

I added an "owner" element to every lock - I have 32 cpus, so I made
it "unsigned int". Then added to the lock and trylock paths code to
check that owner was 0 when the lock was granted, followed by:
lock->owner |= (1u << cpu);  Then in the unlock path I check that just
the (1u << cpu) bit is set before doing: lock->owner &= ~(1u << cpu);

In my first test I got a hit. cpu28 had failed to get the lock and was
spinning holding ticket "1". When "now serving" hit 1, cpu28 saw that
the owner field was set to 0x1, indicating that cpu0 had also claimed
the lock. The lockword was 0x20002 at this point ... so cpu28 was
correct to believe that the lock had been freed and handed to it.  It
was unclear why cpu0 had muscled in and set its bit in the owner
field. Also can't tell whether that was a newly allocated lock, or one
that had recently wrapped around.

Subsequent tests have failed to reproduce that result - system just
hangs without complaining about multiple cpus owning the same lock at
the same time - perhaps because of the extra tracing I included to
capture more details.

-Tony

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
  2010-09-03  0:06                                 ` Tony Luck
@ 2010-09-03  9:04                                   ` Petr Tesarik
  -1 siblings, 0 replies; 72+ messages in thread
From: Petr Tesarik @ 2010-09-03  9:04 UTC (permalink / raw)
  To: Tony Luck; +Cc: linux-ia64, linux-kernel

On Friday 03 of September 2010 02:06:33 Tony Luck wrote:
> Today's experiments were inspired by Petr's comment at the start of this
> thread:
>
>    "Interestingly, CPU 5 and CPU 7 are both granted the same ticket"
>
> I added an "owner" element to every lock - I have 32 cpus, so I made
> it "unsigned int". Then added to the lock and trylock paths code to
> check that owner was 0 when the lock was granted, followed by:
> lock->owner |= (1u << cpu);  Then in the unlock path I check that just
> the (1u << cpu) bit is set before doing: lock->owner &= ~(1u << cpu);
>
> In my first test I got a hit. cpu28 had failed to get the lock and was
> spinning holding ticket "1". When "now serving" hit 1, cpu28 saw that
> the owner field was set to 0x1, indicating that cpu0 had also claimed
> the lock. The lockword was 0x20002 at this point ... so cpu28 was
> correct to believe that the lock had been freed and handed to it.  It
> was unclear why cpu0 had muscled in and set its bit in the owner
> field. Also can't tell whether that was a newly allocated lock, or one
> that had recently wrapped around.
>
> Subsequent tests have failed to reproduce that result - system just
> hangs without complaining about multiple cpus owning the same lock at
> the same time - perhaps because of the extra tracing I included to
> capture more details.

I did some extensive testing of the issue. I wrote a Kprobe that attaches to 
copy_process and if the new task is one of the "count" processes, it sets up 
a pair of DBR registers to watch for all writes to the siglock. (Obviously, I 
had to limit parallel runs of "count" to 4, because there are only 8 dbr 
registers.) When I hit the breakpoint, I record the old value (with ld4.acq), 
single step one instruction and read the new value (with ld4.acq). The code 
panics the machine (producing a core-dump) if neither the new head is larger 
than the old head nor the new tail is larger than the old tail.

What I got is rather disturbing. I got three different traces so far, all of 
them on the same fetchadd4.acq instruction. The observed values are:

   BEFORE   reg   AFTER   DUMP
A.    0      1       0      0
B.    1      0       1      1
C.    0      1       0      1

BEFORE .. value seen by ld4.acq in the first debug fault
reg .. value in the target register of fetchadd
AFTER .. value seen by ld4.acq after single step
DUMP .. value saved to the crash dump

Interestingly, sometimes there was no write recorded with the new value equal 
to the BEFORE column. Then it occured to me that I probably missed some 
writes from interrupt context, because psr.db gets cleared by the CPU. So I 
modified ivt.S so that it explicitly re-enabled psr.db. And I got a crash 
dump with variant C.

I thought that I still missed some writes somehow, but consider that I never 
got any failures other than after a wrap-around, even though the code would 
catch any case where the lock does not increment correctly.

Moreover, variant B cannot be explained even if I did miss a fetchadd4. How 
can we get 1 on the first ld4.acq, and then 0 from the fetchadd4.acq?

I'm now trying to modify the lock primitives:

1. replace the fetchadd4.acq with looping over cmpxchg
2. replace the st2.rel with looping over cmpxchg

I'll write again when I have the results.

Petr Tesarik

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
@ 2010-09-03  9:04                                   ` Petr Tesarik
  0 siblings, 0 replies; 72+ messages in thread
From: Petr Tesarik @ 2010-09-03  9:04 UTC (permalink / raw)
  To: Tony Luck; +Cc: linux-ia64, linux-kernel

On Friday 03 of September 2010 02:06:33 Tony Luck wrote:
> Today's experiments were inspired by Petr's comment at the start of this
> thread:
>
>    "Interestingly, CPU 5 and CPU 7 are both granted the same ticket"
>
> I added an "owner" element to every lock - I have 32 cpus, so I made
> it "unsigned int". Then added to the lock and trylock paths code to
> check that owner was 0 when the lock was granted, followed by:
> lock->owner |= (1u << cpu);  Then in the unlock path I check that just
> the (1u << cpu) bit is set before doing: lock->owner &= ~(1u << cpu);
>
> In my first test I got a hit. cpu28 had failed to get the lock and was
> spinning holding ticket "1". When "now serving" hit 1, cpu28 saw that
> the owner field was set to 0x1, indicating that cpu0 had also claimed
> the lock. The lockword was 0x20002 at this point ... so cpu28 was
> correct to believe that the lock had been freed and handed to it.  It
> was unclear why cpu0 had muscled in and set its bit in the owner
> field. Also can't tell whether that was a newly allocated lock, or one
> that had recently wrapped around.
>
> Subsequent tests have failed to reproduce that result - system just
> hangs without complaining about multiple cpus owning the same lock at
> the same time - perhaps because of the extra tracing I included to
> capture more details.

I did some extensive testing of the issue. I wrote a Kprobe that attaches to 
copy_process and if the new task is one of the "count" processes, it sets up 
a pair of DBR registers to watch for all writes to the siglock. (Obviously, I 
had to limit parallel runs of "count" to 4, because there are only 8 dbr 
registers.) When I hit the breakpoint, I record the old value (with ld4.acq), 
single step one instruction and read the new value (with ld4.acq). The code 
panics the machine (producing a core-dump) if neither the new head is larger 
than the old head nor the new tail is larger than the old tail.

What I got is rather disturbing. I got three different traces so far, all of 
them on the same fetchadd4.acq instruction. The observed values are:

   BEFORE   reg   AFTER   DUMP
A.    0      1       0      0
B.    1      0       1      1
C.    0      1       0      1

BEFORE .. value seen by ld4.acq in the first debug fault
reg .. value in the target register of fetchadd
AFTER .. value seen by ld4.acq after single step
DUMP .. value saved to the crash dump

Interestingly, sometimes there was no write recorded with the new value equal 
to the BEFORE column. Then it occured to me that I probably missed some 
writes from interrupt context, because psr.db gets cleared by the CPU. So I 
modified ivt.S so that it explicitly re-enabled psr.db. And I got a crash 
dump with variant C.

I thought that I still missed some writes somehow, but consider that I never 
got any failures other than after a wrap-around, even though the code would 
catch any case where the lock does not increment correctly.

Moreover, variant B cannot be explained even if I did miss a fetchadd4. How 
can we get 1 on the first ld4.acq, and then 0 from the fetchadd4.acq?

I'm now trying to modify the lock primitives:

1. replace the fetchadd4.acq with looping over cmpxchg
2. replace the st2.rel with looping over cmpxchg

I'll write again when I have the results.

Petr Tesarik

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
  2010-09-03  9:04                                   ` Petr Tesarik
@ 2010-09-03 14:35                                     ` Petr Tesarik
  -1 siblings, 0 replies; 72+ messages in thread
From: Petr Tesarik @ 2010-09-03 14:35 UTC (permalink / raw)
  To: Tony Luck; +Cc: linux-ia64, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2879 bytes --]

On Friday 03 of September 2010 11:04:37 Petr Tesarik wrote:
> [...]
> I'm now trying to modify the lock primitives:
>
> 1. replace the fetchadd4.acq with looping over cmpxchg

I did this and I feel dumber than ever. Basically, I replaced this snippet:

	ticket = ia64_fetchadd(1, p, acq);

with:

	int	tmp;
	do {
		ticket = ACCESS_ONCE(lock->lock);
		asm volatile (
			"mov ar.ccv=%1\n"
			"add %0=1,%1;;\n"
			"cmpxchg4.acq %0=[%2],%0,ar.ccv\n"
			: "=r" (tmp)
			: "r" (ticket), "r" (&lock->lock)
			: "ar.ccv");
	} while (tmp != ticket);

Just to make sure I didn't miss something, this compiled to:

0xa0000001008dacb0: [MMI]       nop.m 0x0
0xa0000001008dacb1:             ld4.acq r15=[r32]
0xa0000001008dacb2:             nop.i 0x0;;
0xa0000001008dacc0: [MII]       mov.m ar.ccv=r15
0xa0000001008dacc1:             adds r14=1,r15;;
0xa0000001008dacc2:             nop.i 0x0
0xa0000001008dacd0: [MII]       cmpxchg4.acq r14=[r32],r14,ar.ccv
0xa0000001008dacd1:             nop.i 0x0
0xa0000001008dacd2:             nop.i 0x0;;
0xa0000001008dace0: [MIB]       nop.m 0x0
0xa0000001008dace1:             cmp4.eq p7,p6=r14,r15
0xa0000001008dace2:       (p06) br.cond.dptk.few 0xa0000001008dacb0

My test module recorded the following sequence on the failing CPU:

  }, {
    ip = 0xa00000010012f7b0,
    addr = 0xe000000181925c08,
    oldvalue = 0xffff0000,
    newvalue = 0x0,
    task = 0xe000000186930000
  }, {
    ip = 0xa0000001008dacd0,
    addr = 0xe000000181925c08,
    oldvalue = 0x0,
    newvalue = 0x0,
    task = 0xe000000186930000
  }, {
    ip = 0xa0000001008dacd0,
    addr = 0xe000000181925c08,
    oldvalue = 0x1,
    newvalue = 0x0,
    task = 0xe000000186930000
  }, {
    ip = 0xa0000001008dacd0,
    addr = 0xe000000181925c08,
    oldvalue = 0x1,
    newvalue = 0x0,
    task = 0xe000000186930000
  }, {
    ip = 0xa0000001008dacd0,
    addr = 0xe000000181925c08,
    oldvalue = 0x0,
    newvalue = 0x0,
    task = 0xe000000186930000
  }, {
    ip = 0xa0000001008dacd0,
    addr = 0xe000000181925c08,
    oldvalue = 0x1,
    newvalue = 0x1,
    task = 0xe000000186930000
  }, {

I didn't see values around zero on any other CPU in the system. So, either 
there is something seriously broken in hardware, or I made a silly mistake in 
the monitoring code.

I'm attaching my SystemTap script. I know it's hacky, but it worked for me.

Oh, I had to make two modification to the running kernel:

1. in ia64_fault()
By default the value of cr.ifa is not passed to the die notifiers, so I 
(mis)used the ar_ssd field to store the ifa before calling notify_die() for 
the debug faults.

2. in ivt.S
On all interrupt entries I added code similar to this (just using different 
registers if appropriate):

	movl r3 = (1 << 24)
	mov r15 = psr
	;;
	or  r3 = r3,r15
	;;
	mov psr.l = r3
	;;
	srlz.d
	;;

Am I blind and did I do something obviously wrong?

Petr Tesarik

[-- Attachment #2: watchlock.stp --]
[-- Type: text/x-csrc, Size: 7510 bytes --]

/* Watch the spin lock for changes */

%{
#include <linux/kdebug.h>
#include <asm/kregs.h>
#include <asm/hw_irq.h>
%}

%{
static void
init_and_panic(const char *msg)
{
	unsigned int cpu, self_cpu;
	self_cpu = smp_processor_id();
	for_each_online_cpu(cpu)
		if (cpu != self_cpu)
			platform_send_ipi(cpu, 0, IA64_IPI_DM_INIT, 0);

	panic(msg);
}

static inline unsigned long
get_siglock(unsigned long addr)
{
	unsigned long ret;

	asm volatile ("ld4.acq %0=[%1]\n" : "=r" (ret) : "r" (addr));
	return ret;
}

struct fault_info {
	unsigned long ip, addr;
	unsigned long oldvalue, newvalue;
	struct task_struct *task;
};

#define DBR_COUNT	4

static DEFINE_SPINLOCK(dbr_lock);
static unsigned long dbr[DBR_COUNT];

#define HISTORY_SIZE	256

static DEFINE_PER_CPU(struct fault_info[HISTORY_SIZE], siglock_history);
static DEFINE_PER_CPU(unsigned int, siglock_history_idx);

static unsigned long
record_oldsiglock(struct pt_regs *regs, unsigned long addr)
{
	unsigned int *idx = &get_cpu_var(siglock_history_idx);
	struct fault_info *info = &get_cpu_var(siglock_history)[*idx];
	unsigned long oldval;

	info->ip = regs->cr_iip;
	info->addr = addr;
	info->oldvalue = oldval = get_siglock(info->addr);
	info->task = current;

	put_cpu_var(siglock_history);
	*idx = (*idx + 1) % HISTORY_SIZE;
	put_cpu_var(siglock_history_idx);

	return oldval;
}

static unsigned long
record_newsiglock(struct pt_regs *regs)
{
	unsigned int idx = (get_cpu_var(siglock_history_idx) - 1) % HISTORY_SIZE;
	struct fault_info *info = &get_cpu_var(siglock_history)[idx];
	unsigned long newval;

	BUG_ON(info->task != current);
	info->newvalue = newval = get_siglock(info->addr);

	put_cpu_var(siglock_history);
	put_cpu_var(siglock_history_idx);

	return newval;
}

#define TICKET_SHIFT    17
#define TICKET_BITS     15
#define TICKET_MASK     ((1 << TICKET_BITS) - 1)

static int
goes_wrong(void)
{
	unsigned int previdx;
	struct fault_info *info;
	long prevhead, curhead;
	long prevtail, curtail;

	previdx = (get_cpu_var(siglock_history_idx) - 1) % HISTORY_SIZE;
	put_cpu_var(siglock_history_idx);

	info = get_cpu_var(siglock_history);
	prevhead = info[previdx].oldvalue & TICKET_MASK;
	curhead = info[previdx].newvalue & TICKET_MASK;
	prevtail = (info[previdx].oldvalue >> TICKET_SHIFT) & TICKET_MASK;
	curtail = (info[previdx].newvalue >> TICKET_SHIFT) & TICKET_MASK;
	put_cpu_var(siglock_history);

	if (curtail == prevtail &&
	    (((curhead - prevhead) & TICKET_MASK) == 0 ||
	     ((curhead - prevhead) & TICKET_MASK) >= (1 << 14)))
		return 1;

	return 0;
}

static unsigned long
get_gr(struct pt_regs *regs, int num)
{
	switch(num) {
	case 1:
		return regs->r1;
	case 2 ... 3:
		return (&regs->r2)[num - 2];
	case 4:
		{
			register unsigned long r4 asm("r4");
			return r4;
		}
	case 5:
		{
			register unsigned long r5 asm("r5");
			return r5;
		}
	case 6:
		{
			register unsigned long r6 asm("r6");
			return r6;
		}
	case 7:
		{
			register unsigned long r7 asm("r7");
			return r7;
		}
	case 8 ... 11:
		return (&regs->r8)[num - 8];
	case 12 ... 13:
	case 15:
		return (&regs->r12)[num - 12];
	case 14:
		return regs->r14;
	case 16 ... 31:
		return (&regs->r16)[num - 16];
	case 32 ... 127:
		/* Too lazy to write code to access the backing store.
		   Just let debug_notify() ignore this access */
		return regs->ar_ccv ^ 1;
	default:
		BUG();
	}
}

static int
should_modify(struct pt_regs *regs)
{
	unsigned long opcode;
	long regnum;

	/* Check if this is a cmpxchg4 instruction */
	/* This is a HACK and only works for our special case */
	opcode = *(unsigned long*)regs->cr_iip;
	if ((opcode & 0x3fd900000003UL) != 0x101100000001UL)
		return 1;

	/* Extract the target register number */
	regnum = (opcode >> 11) & 0x7f;

	/* Check whether the compare was successful (the register
	   value is equal to ar.ccv */
	return get_gr(regs, regnum) == regs->ar_ccv;
}

static int
debug_notify(struct notifier_block *self, unsigned long val, void *data)
{
	struct die_args *args = (struct die_args *)data;
	struct pt_regs *regs = args->regs;
	unsigned long vector = args->err;
	unsigned long lockaddr = regs->ar_ssd & ~0x3UL;

	if (val != DIE_FAULT)
		return NOTIFY_DONE;

	if (vector == 29) {
		unsigned long siglock_value =
			record_oldsiglock(regs, lockaddr);

		if ((siglock_value & ~1ULL) == 0x40000)
			init_and_panic("Spinlock corrupted already!");

		regs->cr_ipsr |= IA64_PSR_DD | IA64_PSR_SS;
		return NOTIFY_STOP;
	} else if (vector == 36) {
		unsigned long siglock_value = record_newsiglock(regs);

		if (should_modify(regs) && goes_wrong())
			init_and_panic("Spinlock goes wrong!");
		if ((siglock_value & ~1ULL) == 0x40000)
			init_and_panic("Caught the corruption!");

		regs->cr_ipsr &= ~IA64_PSR_SS;
		return NOTIFY_STOP;
	}

	return NOTIFY_DONE;
}

static struct notifier_block debug_notifier = {
	.notifier_call = debug_notify,
	.priority = 0x7ffffffe
};

%}

%{
static void
load_dbr(void *dummy)
{
	unsigned long defmask = (((1UL << 56) - 1) & ~0x3ULL) +
		(0xfUL << 56) +	/* enable in all privilege levels */
		(1UL << 62);	/* trigger on write */
	int i;

	for (i = 0; i < DBR_COUNT; ++i) {
		unsigned long mask = dbr[i] ? defmask : 0UL;

		asm volatile (
			"mov dbr[%0]=%1;;\n"
			"mov dbr[%2]=%3;;\n"
			: : "r" (2*i), "r" (dbr[i]), "r" (2*i+1), "r" (mask));
	}
	asm volatile("srlz.d;;");
}
%}

function enable_debug ()
%{
	load_dbr(NULL);
	CONTEXT->regs->cr_ipsr |= IA64_PSR_DB;
%}

function disable_debug ()
%{
	unsigned long flags;
	int i;

	for (i = 0; i < DBR_COUNT; ++i) {
		asm volatile (
			"mov dbr[%0]=%1;;\n"
			"mov dbr[%2]=%3;;\n"
			: : "r" (2*i), "r" (0UL), "r" (2*i+1), "r" (0UL));
	}
	asm volatile("srlz.d;;");

	CONTEXT->regs->cr_ipsr &= ~IA64_PSR_DB;
%}

function try_add_debug (sighand:long)
%{
	struct sighand_struct *sighand =
		(struct sighand_struct*)THIS->sighand;
	unsigned long lockaddr = (unsigned long) &sighand->siglock;
	unsigned long flags;
	int i, freeslot;

	spin_lock_irqsave(&dbr_lock, flags);

	freeslot = -1;
	for (i = 0; i < DBR_COUNT; ++i) {
		if (dbr[i] == lockaddr)
			break;
		if (!dbr[i] && freeslot < 0)
			freeslot = i;
	}
	if (i == DBR_COUNT && freeslot >= 0) {
		dbr[freeslot] = lockaddr;
		smp_call_function(load_dbr, NULL, 0);
	}

	spin_unlock_irqrestore(&dbr_lock, flags);
%}

function try_remove_debug (sighand:long)
%{
	struct sighand_struct *sighand =
		(struct sighand_struct*)THIS->sighand;
	unsigned long lockaddr = (unsigned long) &sighand->siglock;
	unsigned long flags;
	int i;

	/* Racy, but best we can get :( */
	if (atomic_read(&sighand->count) > 1)
		return;

	spin_lock_irqsave(&dbr_lock, flags);

	for (i = 0; i < DBR_COUNT; ++i) {
		if (dbr[i] == lockaddr)
			dbr[i] = 0;
	}

	spin_unlock_irqrestore(&dbr_lock, flags);
%}

probe syscall.*
{
	enable_debug();
}

#if 0
probe syscall.*.return
{
	disable_debug();
}
#endif

probe kernel.function("schedule").return
{
	enable_debug ();
}

function is_our_task (task:long)
%{
	struct task_struct *tsk = (struct task_struct*)THIS->task;
        THIS->__retvalue = !IS_ERR(tsk) && !strcmp(tsk->comm, "count");
%}

probe kernel.function("copy_process").return
{
	if (is_our_task($return))
		try_add_debug($return->sighand);
}

probe kernel.function("__cleanup_sighand")
{
	try_remove_debug($sighand);
}

function setup_watch ()
%{
	register_die_notifier(&debug_notifier);
%}

function remove_watch ()
%{
	memset(dbr, sizeof dbr, 0);
	smp_call_function(load_dbr, NULL, 0);
	unregister_die_notifier(&debug_notifier);
%}

probe begin
{
	setup_watch();
}

probe end, error
{
	remove_watch();
}

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
@ 2010-09-03 14:35                                     ` Petr Tesarik
  0 siblings, 0 replies; 72+ messages in thread
From: Petr Tesarik @ 2010-09-03 14:35 UTC (permalink / raw)
  To: Tony Luck; +Cc: linux-ia64, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2879 bytes --]

On Friday 03 of September 2010 11:04:37 Petr Tesarik wrote:
> [...]
> I'm now trying to modify the lock primitives:
>
> 1. replace the fetchadd4.acq with looping over cmpxchg

I did this and I feel dumber than ever. Basically, I replaced this snippet:

	ticket = ia64_fetchadd(1, p, acq);

with:

	int	tmp;
	do {
		ticket = ACCESS_ONCE(lock->lock);
		asm volatile (
			"mov ar.ccv=%1\n"
			"add %0=1,%1;;\n"
			"cmpxchg4.acq %0=[%2],%0,ar.ccv\n"
			: "=r" (tmp)
			: "r" (ticket), "r" (&lock->lock)
			: "ar.ccv");
	} while (tmp != ticket);

Just to make sure I didn't miss something, this compiled to:

0xa0000001008dacb0: [MMI]       nop.m 0x0
0xa0000001008dacb1:             ld4.acq r15=[r32]
0xa0000001008dacb2:             nop.i 0x0;;
0xa0000001008dacc0: [MII]       mov.m ar.ccv=r15
0xa0000001008dacc1:             adds r14=1,r15;;
0xa0000001008dacc2:             nop.i 0x0
0xa0000001008dacd0: [MII]       cmpxchg4.acq r14=[r32],r14,ar.ccv
0xa0000001008dacd1:             nop.i 0x0
0xa0000001008dacd2:             nop.i 0x0;;
0xa0000001008dace0: [MIB]       nop.m 0x0
0xa0000001008dace1:             cmp4.eq p7,p6=r14,r15
0xa0000001008dace2:       (p06) br.cond.dptk.few 0xa0000001008dacb0

My test module recorded the following sequence on the failing CPU:

  }, {
    ip = 0xa00000010012f7b0,
    addr = 0xe000000181925c08,
    oldvalue = 0xffff0000,
    newvalue = 0x0,
    task = 0xe000000186930000
  }, {
    ip = 0xa0000001008dacd0,
    addr = 0xe000000181925c08,
    oldvalue = 0x0,
    newvalue = 0x0,
    task = 0xe000000186930000
  }, {
    ip = 0xa0000001008dacd0,
    addr = 0xe000000181925c08,
    oldvalue = 0x1,
    newvalue = 0x0,
    task = 0xe000000186930000
  }, {
    ip = 0xa0000001008dacd0,
    addr = 0xe000000181925c08,
    oldvalue = 0x1,
    newvalue = 0x0,
    task = 0xe000000186930000
  }, {
    ip = 0xa0000001008dacd0,
    addr = 0xe000000181925c08,
    oldvalue = 0x0,
    newvalue = 0x0,
    task = 0xe000000186930000
  }, {
    ip = 0xa0000001008dacd0,
    addr = 0xe000000181925c08,
    oldvalue = 0x1,
    newvalue = 0x1,
    task = 0xe000000186930000
  }, {

I didn't see values around zero on any other CPU in the system. So, either 
there is something seriously broken in hardware, or I made a silly mistake in 
the monitoring code.

I'm attaching my SystemTap script. I know it's hacky, but it worked for me.

Oh, I had to make two modification to the running kernel:

1. in ia64_fault()
By default the value of cr.ifa is not passed to the die notifiers, so I 
(mis)used the ar_ssd field to store the ifa before calling notify_die() for 
the debug faults.

2. in ivt.S
On all interrupt entries I added code similar to this (just using different 
registers if appropriate):

	movl r3 = (1 << 24)
	mov r15 = psr
	;;
	or  r3 = r3,r15
	;;
	mov psr.l = r3
	;;
	srlz.d
	;;

Am I blind and did I do something obviously wrong?

Petr Tesarik

[-- Attachment #2: watchlock.stp --]
[-- Type: text/x-csrc, Size: 7510 bytes --]

/* Watch the spin lock for changes */

%{
#include <linux/kdebug.h>
#include <asm/kregs.h>
#include <asm/hw_irq.h>
%}

%{
static void
init_and_panic(const char *msg)
{
	unsigned int cpu, self_cpu;
	self_cpu = smp_processor_id();
	for_each_online_cpu(cpu)
		if (cpu != self_cpu)
			platform_send_ipi(cpu, 0, IA64_IPI_DM_INIT, 0);

	panic(msg);
}

static inline unsigned long
get_siglock(unsigned long addr)
{
	unsigned long ret;

	asm volatile ("ld4.acq %0=[%1]\n" : "=r" (ret) : "r" (addr));
	return ret;
}

struct fault_info {
	unsigned long ip, addr;
	unsigned long oldvalue, newvalue;
	struct task_struct *task;
};

#define DBR_COUNT	4

static DEFINE_SPINLOCK(dbr_lock);
static unsigned long dbr[DBR_COUNT];

#define HISTORY_SIZE	256

static DEFINE_PER_CPU(struct fault_info[HISTORY_SIZE], siglock_history);
static DEFINE_PER_CPU(unsigned int, siglock_history_idx);

static unsigned long
record_oldsiglock(struct pt_regs *regs, unsigned long addr)
{
	unsigned int *idx = &get_cpu_var(siglock_history_idx);
	struct fault_info *info = &get_cpu_var(siglock_history)[*idx];
	unsigned long oldval;

	info->ip = regs->cr_iip;
	info->addr = addr;
	info->oldvalue = oldval = get_siglock(info->addr);
	info->task = current;

	put_cpu_var(siglock_history);
	*idx = (*idx + 1) % HISTORY_SIZE;
	put_cpu_var(siglock_history_idx);

	return oldval;
}

static unsigned long
record_newsiglock(struct pt_regs *regs)
{
	unsigned int idx = (get_cpu_var(siglock_history_idx) - 1) % HISTORY_SIZE;
	struct fault_info *info = &get_cpu_var(siglock_history)[idx];
	unsigned long newval;

	BUG_ON(info->task != current);
	info->newvalue = newval = get_siglock(info->addr);

	put_cpu_var(siglock_history);
	put_cpu_var(siglock_history_idx);

	return newval;
}

#define TICKET_SHIFT    17
#define TICKET_BITS     15
#define TICKET_MASK     ((1 << TICKET_BITS) - 1)

static int
goes_wrong(void)
{
	unsigned int previdx;
	struct fault_info *info;
	long prevhead, curhead;
	long prevtail, curtail;

	previdx = (get_cpu_var(siglock_history_idx) - 1) % HISTORY_SIZE;
	put_cpu_var(siglock_history_idx);

	info = get_cpu_var(siglock_history);
	prevhead = info[previdx].oldvalue & TICKET_MASK;
	curhead = info[previdx].newvalue & TICKET_MASK;
	prevtail = (info[previdx].oldvalue >> TICKET_SHIFT) & TICKET_MASK;
	curtail = (info[previdx].newvalue >> TICKET_SHIFT) & TICKET_MASK;
	put_cpu_var(siglock_history);

	if (curtail == prevtail &&
	    (((curhead - prevhead) & TICKET_MASK) == 0 ||
	     ((curhead - prevhead) & TICKET_MASK) >= (1 << 14)))
		return 1;

	return 0;
}

static unsigned long
get_gr(struct pt_regs *regs, int num)
{
	switch(num) {
	case 1:
		return regs->r1;
	case 2 ... 3:
		return (&regs->r2)[num - 2];
	case 4:
		{
			register unsigned long r4 asm("r4");
			return r4;
		}
	case 5:
		{
			register unsigned long r5 asm("r5");
			return r5;
		}
	case 6:
		{
			register unsigned long r6 asm("r6");
			return r6;
		}
	case 7:
		{
			register unsigned long r7 asm("r7");
			return r7;
		}
	case 8 ... 11:
		return (&regs->r8)[num - 8];
	case 12 ... 13:
	case 15:
		return (&regs->r12)[num - 12];
	case 14:
		return regs->r14;
	case 16 ... 31:
		return (&regs->r16)[num - 16];
	case 32 ... 127:
		/* Too lazy to write code to access the backing store.
		   Just let debug_notify() ignore this access */
		return regs->ar_ccv ^ 1;
	default:
		BUG();
	}
}

static int
should_modify(struct pt_regs *regs)
{
	unsigned long opcode;
	long regnum;

	/* Check if this is a cmpxchg4 instruction */
	/* This is a HACK and only works for our special case */
	opcode = *(unsigned long*)regs->cr_iip;
	if ((opcode & 0x3fd900000003UL) != 0x101100000001UL)
		return 1;

	/* Extract the target register number */
	regnum = (opcode >> 11) & 0x7f;

	/* Check whether the compare was successful (the register
	   value is equal to ar.ccv */
	return get_gr(regs, regnum) == regs->ar_ccv;
}

static int
debug_notify(struct notifier_block *self, unsigned long val, void *data)
{
	struct die_args *args = (struct die_args *)data;
	struct pt_regs *regs = args->regs;
	unsigned long vector = args->err;
	unsigned long lockaddr = regs->ar_ssd & ~0x3UL;

	if (val != DIE_FAULT)
		return NOTIFY_DONE;

	if (vector == 29) {
		unsigned long siglock_value =
			record_oldsiglock(regs, lockaddr);

		if ((siglock_value & ~1ULL) == 0x40000)
			init_and_panic("Spinlock corrupted already!");

		regs->cr_ipsr |= IA64_PSR_DD | IA64_PSR_SS;
		return NOTIFY_STOP;
	} else if (vector == 36) {
		unsigned long siglock_value = record_newsiglock(regs);

		if (should_modify(regs) && goes_wrong())
			init_and_panic("Spinlock goes wrong!");
		if ((siglock_value & ~1ULL) == 0x40000)
			init_and_panic("Caught the corruption!");

		regs->cr_ipsr &= ~IA64_PSR_SS;
		return NOTIFY_STOP;
	}

	return NOTIFY_DONE;
}

static struct notifier_block debug_notifier = {
	.notifier_call = debug_notify,
	.priority = 0x7ffffffe
};

%}

%{
static void
load_dbr(void *dummy)
{
	unsigned long defmask = (((1UL << 56) - 1) & ~0x3ULL) +
		(0xfUL << 56) +	/* enable in all privilege levels */
		(1UL << 62);	/* trigger on write */
	int i;

	for (i = 0; i < DBR_COUNT; ++i) {
		unsigned long mask = dbr[i] ? defmask : 0UL;

		asm volatile (
			"mov dbr[%0]=%1;;\n"
			"mov dbr[%2]=%3;;\n"
			: : "r" (2*i), "r" (dbr[i]), "r" (2*i+1), "r" (mask));
	}
	asm volatile("srlz.d;;");
}
%}

function enable_debug ()
%{
	load_dbr(NULL);
	CONTEXT->regs->cr_ipsr |= IA64_PSR_DB;
%}

function disable_debug ()
%{
	unsigned long flags;
	int i;

	for (i = 0; i < DBR_COUNT; ++i) {
		asm volatile (
			"mov dbr[%0]=%1;;\n"
			"mov dbr[%2]=%3;;\n"
			: : "r" (2*i), "r" (0UL), "r" (2*i+1), "r" (0UL));
	}
	asm volatile("srlz.d;;");

	CONTEXT->regs->cr_ipsr &= ~IA64_PSR_DB;
%}

function try_add_debug (sighand:long)
%{
	struct sighand_struct *sighand =
		(struct sighand_struct*)THIS->sighand;
	unsigned long lockaddr = (unsigned long) &sighand->siglock;
	unsigned long flags;
	int i, freeslot;

	spin_lock_irqsave(&dbr_lock, flags);

	freeslot = -1;
	for (i = 0; i < DBR_COUNT; ++i) {
		if (dbr[i] == lockaddr)
			break;
		if (!dbr[i] && freeslot < 0)
			freeslot = i;
	}
	if (i == DBR_COUNT && freeslot >= 0) {
		dbr[freeslot] = lockaddr;
		smp_call_function(load_dbr, NULL, 0);
	}

	spin_unlock_irqrestore(&dbr_lock, flags);
%}

function try_remove_debug (sighand:long)
%{
	struct sighand_struct *sighand =
		(struct sighand_struct*)THIS->sighand;
	unsigned long lockaddr = (unsigned long) &sighand->siglock;
	unsigned long flags;
	int i;

	/* Racy, but best we can get :( */
	if (atomic_read(&sighand->count) > 1)
		return;

	spin_lock_irqsave(&dbr_lock, flags);

	for (i = 0; i < DBR_COUNT; ++i) {
		if (dbr[i] == lockaddr)
			dbr[i] = 0;
	}

	spin_unlock_irqrestore(&dbr_lock, flags);
%}

probe syscall.*
{
	enable_debug();
}

#if 0
probe syscall.*.return
{
	disable_debug();
}
#endif

probe kernel.function("schedule").return
{
	enable_debug ();
}

function is_our_task (task:long)
%{
	struct task_struct *tsk = (struct task_struct*)THIS->task;
        THIS->__retvalue = !IS_ERR(tsk) && !strcmp(tsk->comm, "count");
%}

probe kernel.function("copy_process").return
{
	if (is_our_task($return))
		try_add_debug($return->sighand);
}

probe kernel.function("__cleanup_sighand")
{
	try_remove_debug($sighand);
}

function setup_watch ()
%{
	register_die_notifier(&debug_notifier);
%}

function remove_watch ()
%{
	memset(dbr, sizeof dbr, 0);
	smp_call_function(load_dbr, NULL, 0);
	unregister_die_notifier(&debug_notifier);
%}

probe begin
{
	setup_watch();
}

probe end, error
{
	remove_watch();
}

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
  2010-09-03 14:35                                     ` Petr Tesarik
@ 2010-09-03 14:52                                       ` Petr Tesarik
  -1 siblings, 0 replies; 72+ messages in thread
From: Petr Tesarik @ 2010-09-03 14:52 UTC (permalink / raw)
  To: Tony Luck; +Cc: linux-ia64, linux-kernel

On Friday 03 of September 2010 16:35:23 Petr Tesarik wrote:
> On Friday 03 of September 2010 11:04:37 Petr Tesarik wrote:
> > [...]
> > I'm now trying to modify the lock primitives:
> >
> > 1. replace the fetchadd4.acq with looping over cmpxchg
>
> I did this and I feel dumber than ever.

One more thing - the crash dump I got from that run shows that CPU 2 was just 
going through zap_page_range(), so it probably also did a few global TLB 
flushes. I'm not sure how this should matter, but any idea is good now, I 
think.

Anyway, if a global TLB flush is necessary to trigger the bug, it would also 
explain why we couldn't reproduce it in user-space.

OK, I know I'm just wildly guessing (and don't have any explanation for the 
wrap-around mystery) ... but does anybody have a better idea?

Petr Tesarik

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
@ 2010-09-03 14:52                                       ` Petr Tesarik
  0 siblings, 0 replies; 72+ messages in thread
From: Petr Tesarik @ 2010-09-03 14:52 UTC (permalink / raw)
  To: Tony Luck; +Cc: linux-ia64, linux-kernel

On Friday 03 of September 2010 16:35:23 Petr Tesarik wrote:
> On Friday 03 of September 2010 11:04:37 Petr Tesarik wrote:
> > [...]
> > I'm now trying to modify the lock primitives:
> >
> > 1. replace the fetchadd4.acq with looping over cmpxchg
>
> I did this and I feel dumber than ever.

One more thing - the crash dump I got from that run shows that CPU 2 was just 
going through zap_page_range(), so it probably also did a few global TLB 
flushes. I'm not sure how this should matter, but any idea is good now, I 
think.

Anyway, if a global TLB flush is necessary to trigger the bug, it would also 
explain why we couldn't reproduce it in user-space.

OK, I know I'm just wildly guessing (and don't have any explanation for the 
wrap-around mystery) ... but does anybody have a better idea?

Petr Tesarik

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
  2010-09-03 14:52                                       ` Petr Tesarik
@ 2010-09-03 15:50                                         ` Tony Luck
  -1 siblings, 0 replies; 72+ messages in thread
From: Tony Luck @ 2010-09-03 15:50 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: linux-ia64, linux-kernel

On Fri, Sep 3, 2010 at 7:52 AM, Petr Tesarik <ptesarik@suse.cz> wrote:
> Anyway, if a global TLB flush is necessary to trigger the bug, it would also
> explain why we couldn't reproduce it in user-space.

Perhaps ... I did explore the TLB in one variant of my user mode test I added
a pointer-chasing routine that looked at enough pages to clear out the TLB.
Not quite the same as a flush - but close.  It didn't help at all.

-Tony

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
@ 2010-09-03 15:50                                         ` Tony Luck
  0 siblings, 0 replies; 72+ messages in thread
From: Tony Luck @ 2010-09-03 15:50 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: linux-ia64, linux-kernel

On Fri, Sep 3, 2010 at 7:52 AM, Petr Tesarik <ptesarik@suse.cz> wrote:
> Anyway, if a global TLB flush is necessary to trigger the bug, it would also
> explain why we couldn't reproduce it in user-space.

Perhaps ... I did explore the TLB in one variant of my user mode test I added
a pointer-chasing routine that looked at enough pages to clear out the TLB.
Not quite the same as a flush - but close.  It didn't help at all.

-Tony

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
  2010-09-03 15:50                                         ` Tony Luck
@ 2010-09-06 14:47                                           ` Petr Tesarik
  -1 siblings, 0 replies; 72+ messages in thread
From: Petr Tesarik @ 2010-09-06 14:47 UTC (permalink / raw)
  To: Tony Luck; +Cc: linux-ia64, linux-kernel

On Friday 03 of September 2010 17:50:53 Tony Luck wrote:
> On Fri, Sep 3, 2010 at 7:52 AM, Petr Tesarik <ptesarik@suse.cz> wrote:
> > Anyway, if a global TLB flush is necessary to trigger the bug, it would
> > also explain why we couldn't reproduce it in user-space.
>
> Perhaps ... I did explore the TLB in one variant of my user mode test I
> added a pointer-chasing routine that looked at enough pages to clear out
> the TLB. Not quite the same as a flush - but close.  It didn't help at all.

Hi Tony,

I experimented a lot with the code, trying to find a solution, but all in 
vain. I also tried to add a "dep %0=0,%0,15,2" instruction in the cmpxchg4 
loop in __ticket_spin_lock but it still failed when the wrapped around to 
zero (but now the high word was not even touched).

Replacing the "st2.rel" instruction with a similar cmpxchg4 loop in 
__ticket_spin_unlock did not help either (so we no longer have two accesses 
with different sizes).

What I've seen quite often lately is that the spinlock value is read as "0" by 
the ld4.acq in __ticket_spin_lock(), then as "1" by ld4.acq inside the debug 
fault handler, and then as "0" again by the "cmpxchg4" instruction, i.e. the 
spin lock was actually acquired correctly, but the debug code triggered a 
panic. This made me think that I had an error in my debug code, so I tried 
running that test kernel without the probe, just waiting whether the kernel 
hangs. It did hang within 10 minutes (with 6 parallel test case loops and a 
module load/unload loop on another terminal) and produced a crash dump that 
was very similar all the others.

To sum it up:

1. The ld4.acq and fetchadd.acq instructions fail to give us a coherent view 
of the spinlock memory location.
2. So far, the problem has been observed only after the spinlock value changes 
to zero.
3. It cannot be a random memory scribble, because I employed the DBR registers 
to catch all writes to that memory location.
4. We haven't been able to reproduce the problem in user-space.

Frankly, I think that the processor does not follow the IPF specification, 
hence it is a CPU bug.

But let's be extremely cautious here and re-read the specification once more, 
very carefully. We can still miss some writes to the siglock memory location:

1. if the same physical address is accessible with another virtual address
2. if the siglock location is written by a non-mandatory RSE-spill

Option 2 seems extremely unlikely to me. Option 1 is more plausible, but  
given that I never saw any siglock corruption with a value other than zero, 
it still sounds less likely than a pure CPU bug.

Tony, could you please ask around in Intel if there is any way to debug the 
CPU that would help us spot the real cause?

Petr Tesarik

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
@ 2010-09-06 14:47                                           ` Petr Tesarik
  0 siblings, 0 replies; 72+ messages in thread
From: Petr Tesarik @ 2010-09-06 14:47 UTC (permalink / raw)
  To: Tony Luck; +Cc: linux-ia64, linux-kernel

On Friday 03 of September 2010 17:50:53 Tony Luck wrote:
> On Fri, Sep 3, 2010 at 7:52 AM, Petr Tesarik <ptesarik@suse.cz> wrote:
> > Anyway, if a global TLB flush is necessary to trigger the bug, it would
> > also explain why we couldn't reproduce it in user-space.
>
> Perhaps ... I did explore the TLB in one variant of my user mode test I
> added a pointer-chasing routine that looked at enough pages to clear out
> the TLB. Not quite the same as a flush - but close.  It didn't help at all.

Hi Tony,

I experimented a lot with the code, trying to find a solution, but all in 
vain. I also tried to add a "dep %0=0,%0,15,2" instruction in the cmpxchg4 
loop in __ticket_spin_lock but it still failed when the wrapped around to 
zero (but now the high word was not even touched).

Replacing the "st2.rel" instruction with a similar cmpxchg4 loop in 
__ticket_spin_unlock did not help either (so we no longer have two accesses 
with different sizes).

What I've seen quite often lately is that the spinlock value is read as "0" by 
the ld4.acq in __ticket_spin_lock(), then as "1" by ld4.acq inside the debug 
fault handler, and then as "0" again by the "cmpxchg4" instruction, i.e. the 
spin lock was actually acquired correctly, but the debug code triggered a 
panic. This made me think that I had an error in my debug code, so I tried 
running that test kernel without the probe, just waiting whether the kernel 
hangs. It did hang within 10 minutes (with 6 parallel test case loops and a 
module load/unload loop on another terminal) and produced a crash dump that 
was very similar all the others.

To sum it up:

1. The ld4.acq and fetchadd.acq instructions fail to give us a coherent view 
of the spinlock memory location.
2. So far, the problem has been observed only after the spinlock value changes 
to zero.
3. It cannot be a random memory scribble, because I employed the DBR registers 
to catch all writes to that memory location.
4. We haven't been able to reproduce the problem in user-space.

Frankly, I think that the processor does not follow the IPF specification, 
hence it is a CPU bug.

But let's be extremely cautious here and re-read the specification once more, 
very carefully. We can still miss some writes to the siglock memory location:

1. if the same physical address is accessible with another virtual address
2. if the siglock location is written by a non-mandatory RSE-spill

Option 2 seems extremely unlikely to me. Option 1 is more plausible, but  
given that I never saw any siglock corruption with a value other than zero, 
it still sounds less likely than a pure CPU bug.

Tony, could you please ask around in Intel if there is any way to debug the 
CPU that would help us spot the real cause?

Petr Tesarik

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
  2010-09-06 14:47                                           ` Petr Tesarik
@ 2010-09-07 13:17                                             ` Petr Tesarik
  -1 siblings, 0 replies; 72+ messages in thread
From: Petr Tesarik @ 2010-09-07 13:17 UTC (permalink / raw)
  To: Tony Luck; +Cc: linux-ia64, linux-kernel

Hi all,

On Monday 06 of September 2010 16:47:01 Petr Tesarik wrote:
>[...]
> To sum it up:
>
>[...]
> 2. So far, the problem has been observed only after the spinlock value
> changes to zero.

I thought about this point for a while, and then I decided to test this with 
brute force. Why not simply skip the zero? If I shift the head position to 
the right within the lock, I can iterate over odd numbers only. 
Unfortunately, the ia64 platform does not have a fetchadd4 variant with an 
increment of 2, so I had to reduce the size of the head/tail to 14 bits, but 
that's still sufficient for all today's machines. Anyway, I do NOT propose 
this as a solution, rather as a proof of concept.

Anyway, after applying the following patch, the test case provided by Hedi has 
been running for a few hours already. Now, I'm confident this is a hardware 
bug.

Signed-off-by: Petr Tesarik <ptesarik@suse.cz>

diff --git a/arch/ia64/include/asm/spinlock.h 
b/arch/ia64/include/asm/spinlock.h
index f0816ac..01be28e 100644
--- a/arch/ia64/include/asm/spinlock.h
+++ b/arch/ia64/include/asm/spinlock.h
@@ -26,23 +26,28 @@
  * The pad bits in the middle are used to prevent the next_ticket number
  * overflowing into the now_serving number.
  *
- *   31             17  16    15  14                    0
+ *   31             18  17    16  15               2  1 0
  *  +----------------------------------------------------+
- *  |  now_serving     | padding |   next_ticket         |
+ *  |  now_serving     | padding |   next_ticket     | - |
  *  +----------------------------------------------------+
  */
 
-#define TICKET_SHIFT	17
-#define TICKET_BITS	15
+#define TICKET_HSHIFT	2
+#define TICKET_TSHIFT	18
+#define TICKET_BITS	14
 #define	TICKET_MASK	((1 << TICKET_BITS) - 1)
 
+#define __ticket_spin_is_unlocked(ticket, serve) \
+	(!((((serve) >> (TICKET_TSHIFT - TICKET_HSHIFT)) ^ (ticket)) \
+	   & (TICKET_MASK << TICKET_HSHIFT)))
+
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
 	int	*p = (int *)&lock->lock, ticket, serve;
 
-	ticket = ia64_fetchadd(1, p, acq);
+	ticket = ia64_fetchadd(1 << TICKET_HSHIFT, p, acq);
 
-	if (!(((ticket >> TICKET_SHIFT) ^ ticket) & TICKET_MASK))
+	if (__ticket_spin_is_unlocked(ticket, ticket))
 		return;
 
 	ia64_invala();
@@ -50,7 +55,7 @@ static __always_inline void 
__ticket_spin_lock(arch_spinlock_t *lock)
 	for (;;) {
 		asm volatile ("ld4.c.nc %0=[%1]" : "=r"(serve) : "r"(p) : "memory");
 
-		if (!(((serve >> TICKET_SHIFT) ^ ticket) & TICKET_MASK))
+		if (__ticket_spin_is_unlocked(ticket, serve))
 			return;
 		cpu_relax();
 	}
@@ -60,8 +65,8 @@ static __always_inline int 
__ticket_spin_trylock(arch_spinlock_t *lock)
 {
 	int tmp = ACCESS_ONCE(lock->lock);
 
-	if (!(((tmp >> TICKET_SHIFT) ^ tmp) & TICKET_MASK))
-		return ia64_cmpxchg(acq, &lock->lock, tmp, tmp + 1, sizeof (tmp)) == tmp;
+	if (__ticket_spin_is_unlocked(tmp, tmp))
+		return ia64_cmpxchg(acq, &lock->lock, tmp, tmp + (1 << TICKET_HSHIFT), 
sizeof (tmp)) == tmp;
 	return 0;
 }
 
@@ -70,7 +75,7 @@ static __always_inline void 
__ticket_spin_unlock(arch_spinlock_t *lock)
 	unsigned short	*p = (unsigned short *)&lock->lock + 1, tmp;
 
 	asm volatile ("ld2.bias %0=[%1]" : "=r"(tmp) : "r"(p));
-	ACCESS_ONCE(*p) = (tmp + 2) & ~1;
+	ACCESS_ONCE(*p) = (tmp + (1 << (TICKET_TSHIFT - 16))) & ~1;
 }
 
 static __always_inline void __ticket_spin_unlock_wait(arch_spinlock_t *lock)
@@ -81,7 +86,7 @@ static __always_inline void 
__ticket_spin_unlock_wait(arch_spinlock_t *lock)
 
 	for (;;) {
 		asm volatile ("ld4.c.nc %0=[%1]" : "=r"(ticket) : "r"(p) : "memory");
-		if (!(((ticket >> TICKET_SHIFT) ^ ticket) & TICKET_MASK))
+		if (__ticket_spin_is_unlocked(ticket, ticket))
 			return;
 		cpu_relax();
 	}
@@ -91,14 +96,14 @@ static inline int __ticket_spin_is_locked(arch_spinlock_t 
*lock)
 {
 	long tmp = ACCESS_ONCE(lock->lock);
 
-	return !!(((tmp >> TICKET_SHIFT) ^ tmp) & TICKET_MASK);
+	return !__ticket_spin_is_unlocked(tmp, tmp);
 }
 
 static inline int __ticket_spin_is_contended(arch_spinlock_t *lock)
 {
 	long tmp = ACCESS_ONCE(lock->lock);
 
-	return ((tmp - (tmp >> TICKET_SHIFT)) & TICKET_MASK) > 1;
+	return (((tmp >> TICKET_HSHIFT) - (tmp >> TICKET_TSHIFT)) & TICKET_MASK) > 
1;
 }
 
 static inline int arch_spin_is_locked(arch_spinlock_t *lock)
diff --git a/arch/ia64/include/asm/spinlock_types.h 
b/arch/ia64/include/asm/spinlock_types.h
index e2b42a5..4275cd3 100644
--- a/arch/ia64/include/asm/spinlock_types.h
+++ b/arch/ia64/include/asm/spinlock_types.h
@@ -9,7 +9,7 @@ typedef struct {
 	volatile unsigned int lock;
 } arch_spinlock_t;
 
-#define __ARCH_SPIN_LOCK_UNLOCKED	{ 0 }
+#define __ARCH_SPIN_LOCK_UNLOCKED	{ 1 }
 
 typedef struct {
 	volatile unsigned int read_counter	: 31;

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
@ 2010-09-07 13:17                                             ` Petr Tesarik
  0 siblings, 0 replies; 72+ messages in thread
From: Petr Tesarik @ 2010-09-07 13:17 UTC (permalink / raw)
  To: Tony Luck; +Cc: linux-ia64, linux-kernel

Hi all,

On Monday 06 of September 2010 16:47:01 Petr Tesarik wrote:
>[...]
> To sum it up:
>
>[...]
> 2. So far, the problem has been observed only after the spinlock value
> changes to zero.

I thought about this point for a while, and then I decided to test this with 
brute force. Why not simply skip the zero? If I shift the head position to 
the right within the lock, I can iterate over odd numbers only. 
Unfortunately, the ia64 platform does not have a fetchadd4 variant with an 
increment of 2, so I had to reduce the size of the head/tail to 14 bits, but 
that's still sufficient for all today's machines. Anyway, I do NOT propose 
this as a solution, rather as a proof of concept.

Anyway, after applying the following patch, the test case provided by Hedi has 
been running for a few hours already. Now, I'm confident this is a hardware 
bug.

Signed-off-by: Petr Tesarik <ptesarik@suse.cz>

diff --git a/arch/ia64/include/asm/spinlock.h 
b/arch/ia64/include/asm/spinlock.h
index f0816ac..01be28e 100644
--- a/arch/ia64/include/asm/spinlock.h
+++ b/arch/ia64/include/asm/spinlock.h
@@ -26,23 +26,28 @@
  * The pad bits in the middle are used to prevent the next_ticket number
  * overflowing into the now_serving number.
  *
- *   31             17  16    15  14                    0
+ *   31             18  17    16  15               2  1 0
  *  +----------------------------------------------------+
- *  |  now_serving     | padding |   next_ticket         |
+ *  |  now_serving     | padding |   next_ticket     | - |
  *  +----------------------------------------------------+
  */
 
-#define TICKET_SHIFT	17
-#define TICKET_BITS	15
+#define TICKET_HSHIFT	2
+#define TICKET_TSHIFT	18
+#define TICKET_BITS	14
 #define	TICKET_MASK	((1 << TICKET_BITS) - 1)
 
+#define __ticket_spin_is_unlocked(ticket, serve) \
+	(!((((serve) >> (TICKET_TSHIFT - TICKET_HSHIFT)) ^ (ticket)) \
+	   & (TICKET_MASK << TICKET_HSHIFT)))
+
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
 	int	*p = (int *)&lock->lock, ticket, serve;
 
-	ticket = ia64_fetchadd(1, p, acq);
+	ticket = ia64_fetchadd(1 << TICKET_HSHIFT, p, acq);
 
-	if (!(((ticket >> TICKET_SHIFT) ^ ticket) & TICKET_MASK))
+	if (__ticket_spin_is_unlocked(ticket, ticket))
 		return;
 
 	ia64_invala();
@@ -50,7 +55,7 @@ static __always_inline void 
__ticket_spin_lock(arch_spinlock_t *lock)
 	for (;;) {
 		asm volatile ("ld4.c.nc %0=[%1]" : "=r"(serve) : "r"(p) : "memory");
 
-		if (!(((serve >> TICKET_SHIFT) ^ ticket) & TICKET_MASK))
+		if (__ticket_spin_is_unlocked(ticket, serve))
 			return;
 		cpu_relax();
 	}
@@ -60,8 +65,8 @@ static __always_inline int 
__ticket_spin_trylock(arch_spinlock_t *lock)
 {
 	int tmp = ACCESS_ONCE(lock->lock);
 
-	if (!(((tmp >> TICKET_SHIFT) ^ tmp) & TICKET_MASK))
-		return ia64_cmpxchg(acq, &lock->lock, tmp, tmp + 1, sizeof (tmp)) = tmp;
+	if (__ticket_spin_is_unlocked(tmp, tmp))
+		return ia64_cmpxchg(acq, &lock->lock, tmp, tmp + (1 << TICKET_HSHIFT), 
sizeof (tmp)) = tmp;
 	return 0;
 }
 
@@ -70,7 +75,7 @@ static __always_inline void 
__ticket_spin_unlock(arch_spinlock_t *lock)
 	unsigned short	*p = (unsigned short *)&lock->lock + 1, tmp;
 
 	asm volatile ("ld2.bias %0=[%1]" : "=r"(tmp) : "r"(p));
-	ACCESS_ONCE(*p) = (tmp + 2) & ~1;
+	ACCESS_ONCE(*p) = (tmp + (1 << (TICKET_TSHIFT - 16))) & ~1;
 }
 
 static __always_inline void __ticket_spin_unlock_wait(arch_spinlock_t *lock)
@@ -81,7 +86,7 @@ static __always_inline void 
__ticket_spin_unlock_wait(arch_spinlock_t *lock)
 
 	for (;;) {
 		asm volatile ("ld4.c.nc %0=[%1]" : "=r"(ticket) : "r"(p) : "memory");
-		if (!(((ticket >> TICKET_SHIFT) ^ ticket) & TICKET_MASK))
+		if (__ticket_spin_is_unlocked(ticket, ticket))
 			return;
 		cpu_relax();
 	}
@@ -91,14 +96,14 @@ static inline int __ticket_spin_is_locked(arch_spinlock_t 
*lock)
 {
 	long tmp = ACCESS_ONCE(lock->lock);
 
-	return !!(((tmp >> TICKET_SHIFT) ^ tmp) & TICKET_MASK);
+	return !__ticket_spin_is_unlocked(tmp, tmp);
 }
 
 static inline int __ticket_spin_is_contended(arch_spinlock_t *lock)
 {
 	long tmp = ACCESS_ONCE(lock->lock);
 
-	return ((tmp - (tmp >> TICKET_SHIFT)) & TICKET_MASK) > 1;
+	return (((tmp >> TICKET_HSHIFT) - (tmp >> TICKET_TSHIFT)) & TICKET_MASK) > 
1;
 }
 
 static inline int arch_spin_is_locked(arch_spinlock_t *lock)
diff --git a/arch/ia64/include/asm/spinlock_types.h 
b/arch/ia64/include/asm/spinlock_types.h
index e2b42a5..4275cd3 100644
--- a/arch/ia64/include/asm/spinlock_types.h
+++ b/arch/ia64/include/asm/spinlock_types.h
@@ -9,7 +9,7 @@ typedef struct {
 	volatile unsigned int lock;
 } arch_spinlock_t;
 
-#define __ARCH_SPIN_LOCK_UNLOCKED	{ 0 }
+#define __ARCH_SPIN_LOCK_UNLOCKED	{ 1 }
 
 typedef struct {
 	volatile unsigned int read_counter	: 31;

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
  2010-09-07 13:17                                             ` Petr Tesarik
@ 2010-09-07 17:35                                               ` Tony Luck
  -1 siblings, 0 replies; 72+ messages in thread
From: Tony Luck @ 2010-09-07 17:35 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: linux-ia64, linux-kernel

On Tue, Sep 7, 2010 at 6:17 AM, Petr Tesarik <ptesarik@suse.cz> wrote:
> Anyway, after applying the following patch, the test case provided by Hedi has
> been running for a few hours already. Now, I'm confident this is a hardware
> bug.

Started up a test with this patch on my 32-way to confirm.

-Tony

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
@ 2010-09-07 17:35                                               ` Tony Luck
  0 siblings, 0 replies; 72+ messages in thread
From: Tony Luck @ 2010-09-07 17:35 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: linux-ia64, linux-kernel

On Tue, Sep 7, 2010 at 6:17 AM, Petr Tesarik <ptesarik@suse.cz> wrote:
> Anyway, after applying the following patch, the test case provided by Hedi has
> been running for a few hours already. Now, I'm confident this is a hardware
> bug.

Started up a test with this patch on my 32-way to confirm.

-Tony

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
  2010-09-07 17:35                                               ` Tony Luck
@ 2010-09-08 15:55                                                 ` Tony Luck
  -1 siblings, 0 replies; 72+ messages in thread
From: Tony Luck @ 2010-09-08 15:55 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: linux-ia64, linux-kernel

On Tue, Sep 7, 2010 at 10:35 AM, Tony Luck <tony.luck@gmail.com> wrote:
> Started up a test with this patch on my 32-way to confirm.

Still running now (22 hours). So that one works too.

-Tony

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
@ 2010-09-08 15:55                                                 ` Tony Luck
  0 siblings, 0 replies; 72+ messages in thread
From: Tony Luck @ 2010-09-08 15:55 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: linux-ia64, linux-kernel

On Tue, Sep 7, 2010 at 10:35 AM, Tony Luck <tony.luck@gmail.com> wrote:
> Started up a test with this patch on my 32-way to confirm.

Still running now (22 hours). So that one works too.

-Tony

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
  2010-09-03 14:35                                     ` Petr Tesarik
@ 2010-09-10  2:55                                       ` Dave Jones
  -1 siblings, 0 replies; 72+ messages in thread
From: Dave Jones @ 2010-09-10  2:55 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: Tony Luck, linux-ia64, linux-kernel

On Fri, Sep 03, 2010 at 04:35:23PM +0200, Petr Tesarik wrote:
 
 > I didn't see values around zero on any other CPU in the system. So, either 
 > there is something seriously broken in hardware, or I made a silly mistake in 
 > the monitoring code.

 ...

 > 	memset(dbr, sizeof dbr, 0);
                    ^^^^^^^^^^^^^
swapped arguments.  Perhaps unrelated to the problem, but still silly :)

	Dave


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Serious problem with ticket spinlocks on ia64
@ 2010-09-10  2:55                                       ` Dave Jones
  0 siblings, 0 replies; 72+ messages in thread
From: Dave Jones @ 2010-09-10  2:55 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: Tony Luck, linux-ia64, linux-kernel

On Fri, Sep 03, 2010 at 04:35:23PM +0200, Petr Tesarik wrote:
 
 > I didn't see values around zero on any other CPU in the system. So, either 
 > there is something seriously broken in hardware, or I made a silly mistake in 
 > the monitoring code.

 ...

 > 	memset(dbr, sizeof dbr, 0);
                    ^^^^^^^^^^^^^
swapped arguments.  Perhaps unrelated to the problem, but still silly :)

	Dave


^ permalink raw reply	[flat|nested] 72+ messages in thread

end of thread, other threads:[~2010-09-10  2:55 UTC | newest]

Thread overview: 72+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-27 13:37 Serious problem with ticket spinlocks on ia64 Petr Tesarik
2010-08-27 13:37 ` Petr Tesarik
2010-08-27 13:48 ` Hedi Berriche
2010-08-27 13:48   ` Hedi Berriche
2010-08-27 14:09   ` Petr Tesarik
2010-08-27 14:09     ` Petr Tesarik
2010-08-27 14:31     ` Hedi Berriche
2010-08-27 14:31       ` Hedi Berriche
2010-08-27 14:40       ` Petr Tesarik
2010-08-27 14:40         ` Petr Tesarik
2010-08-27 14:52         ` Hedi Berriche
2010-08-27 14:52           ` Hedi Berriche
2010-08-27 16:37           ` Petr Tesarik
2010-08-27 16:37             ` Petr Tesarik
2010-08-27 16:08 ` Luck, Tony
2010-08-27 16:08   ` Luck, Tony
2010-08-27 17:16   ` Petr Tesarik
2010-08-27 17:16     ` Petr Tesarik
2010-08-27 18:20     ` Hedi Berriche
2010-08-27 18:20       ` Hedi Berriche
2010-08-27 19:40     ` Petr Tesarik
2010-08-27 19:40       ` Petr Tesarik
2010-08-27 20:29   ` Luck, Tony
2010-08-27 20:29     ` Luck, Tony
2010-08-27 20:41     ` Petr Tesarik
2010-08-27 20:41       ` Petr Tesarik
2010-08-27 21:03     ` Petr Tesarik
2010-08-27 21:03       ` Petr Tesarik
2010-08-27 21:11       ` Luck, Tony
2010-08-27 21:11         ` Luck, Tony
2010-08-27 22:13         ` Petr Tesarik
2010-08-27 22:13           ` Petr Tesarik
2010-08-27 23:26           ` Luck, Tony
2010-08-27 23:26             ` Luck, Tony
2010-08-27 23:55             ` Luck, Tony
2010-08-27 23:55               ` Luck, Tony
2010-08-28  0:28               ` Hedi Berriche
2010-08-28  0:28                 ` Hedi Berriche
2010-08-28  5:01                 ` Luck, Tony
2010-08-28  5:01                   ` Luck, Tony
2010-08-30 18:17                   ` Luck, Tony
2010-08-30 18:17                     ` Luck, Tony
2010-08-30 21:41                     ` Petr Tesarik
2010-08-30 21:41                       ` Petr Tesarik
2010-08-30 22:43                       ` Tony Luck
2010-08-30 22:43                         ` Tony Luck
2010-08-31 22:17                         ` Tony Luck
2010-08-31 22:17                           ` Tony Luck
2010-09-01 23:09                           ` Tony Luck
2010-09-01 23:09                             ` Tony Luck
2010-09-02  0:26                             ` Hedi Berriche
2010-09-02  0:26                               ` Hedi Berriche
2010-09-03  0:06                               ` Tony Luck
2010-09-03  0:06                                 ` Tony Luck
2010-09-03  9:04                                 ` Petr Tesarik
2010-09-03  9:04                                   ` Petr Tesarik
2010-09-03 14:35                                   ` Petr Tesarik
2010-09-03 14:35                                     ` Petr Tesarik
2010-09-03 14:52                                     ` Petr Tesarik
2010-09-03 14:52                                       ` Petr Tesarik
2010-09-03 15:50                                       ` Tony Luck
2010-09-03 15:50                                         ` Tony Luck
2010-09-06 14:47                                         ` Petr Tesarik
2010-09-06 14:47                                           ` Petr Tesarik
2010-09-07 13:17                                           ` Petr Tesarik
2010-09-07 13:17                                             ` Petr Tesarik
2010-09-07 17:35                                             ` Tony Luck
2010-09-07 17:35                                               ` Tony Luck
2010-09-08 15:55                                               ` Tony Luck
2010-09-08 15:55                                                 ` Tony Luck
2010-09-10  2:55                                     ` Dave Jones
2010-09-10  2:55                                       ` Dave Jones

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.