linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* BUG: spinlock lockup
@ 2014-01-18  7:25 naveen yadav
  2014-01-20 10:20 ` Will Deacon
  0 siblings, 1 reply; 5+ messages in thread
From: naveen yadav @ 2014-01-18  7:25 UTC (permalink / raw)
  To: Russell King - ARM Linux, Catalin Marinas, linux-kernel, will.deacon

Dear All,

We are using 3.8.x  kernel on ARM, We are facing soft lockup issue.
Following are the logs.

BUG: spinlock lockup suspected on CPU#0, process1/525
lock: 0xd8ac9a64, .magic: dead4ead, .owner: <none>/-1, .owner_cpu: -1


1 . Looks like lock is available as owner is -1, why arch_spin_trylock
is getting failed ?

2. There is a patch : ARM: spinlock: retry trylock operation if strex
fails on free lock
http://permalink.gmane.org/gmane.linux.ports.arm.kernel/240913
In this patch, A loop has been added around strexeq %2, %0, [%3]".
{Comment "retry the trylock operation if the lock appears
to be free but the strex reported failure"}

but arch_spin_trylock is called by __spin_lock_debug and its already
getting called in loops. So what purpose is resolves?

static void __spin_lock_debug(raw_spinlock_t *lock)
{
        u64 i;
        u64 loops = loops_per_jiffy * HZ;

        for (i = 0; i < loops; i++) {
                if (arch_spin_trylock(&lock->raw_lock))
                        return;
                __delay(1);
        }
        /* lockup suspected: */
        spin_dump(lock, "lockup suspected");
}

3. Is this patch useful to us, How can we reproduce this scenario ?
Scenario : Lock is available but arch_spin_trylock  is returning as failure

Thanks

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: BUG: spinlock lockup
  2014-01-18  7:25 BUG: spinlock lockup naveen yadav
@ 2014-01-20 10:20 ` Will Deacon
  2014-01-21  6:37   ` naveen yadav
  0 siblings, 1 reply; 5+ messages in thread
From: Will Deacon @ 2014-01-20 10:20 UTC (permalink / raw)
  To: naveen yadav; +Cc: Russell King - ARM Linux, Catalin Marinas, linux-kernel

On Sat, Jan 18, 2014 at 07:25:51AM +0000, naveen yadav wrote:
> We are using 3.8.x  kernel on ARM, We are facing soft lockup issue.
> Following are the logs.

Which CPU/SoC are you using?

> BUG: spinlock lockup suspected on CPU#0, process1/525
> lock: 0xd8ac9a64, .magic: dead4ead, .owner: <none>/-1, .owner_cpu: -1
> 
> 
> 1 . Looks like lock is available as owner is -1, why arch_spin_trylock
> is getting failed ?

Is this with or without the ticket lock patches? Can you inspect the actual
value of the arch_spinlock_t?

> 2. There is a patch : ARM: spinlock: retry trylock operation if strex
> fails on free lock
> http://permalink.gmane.org/gmane.linux.ports.arm.kernel/240913
> In this patch, A loop has been added around strexeq %2, %0, [%3]".
> {Comment "retry the trylock operation if the lock appears
> to be free but the strex reported failure"}
> 
> but arch_spin_trylock is called by __spin_lock_debug and its already
> getting called in loops. So what purpose is resolves?

Does this patch help your issue? The purpose of it is to distinguish between
two types of contention:

  (1) The lock is actually taken
  (2) The lock is free, but two people are doing a trylock at the same time

In the case of (2), we do actually want to spin again otherwise you could
potentially end up in a pathological case where the two CPUs repeatedly
shoot down each other's monitor and forward progress isn't made until the
sequence is broken by something like an interrupt.

> static void __spin_lock_debug(raw_spinlock_t *lock)
> {
>         u64 i;
>         u64 loops = loops_per_jiffy * HZ;
> 
>         for (i = 0; i < loops; i++) {
>                 if (arch_spin_trylock(&lock->raw_lock))
>                         return;
>                 __delay(1);
>         }
>         /* lockup suspected: */
>         spin_dump(lock, "lockup suspected");
> }
> 
> 3. Is this patch useful to us, How can we reproduce this scenario ?
> Scenario : Lock is available but arch_spin_trylock  is returning as failure

Potentially. Why can't you simply apply the patch and see if it resolves your
issue?

Will

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: BUG: spinlock lockup
  2014-01-20 10:20 ` Will Deacon
@ 2014-01-21  6:37   ` naveen yadav
  2014-01-21 10:14     ` Will Deacon
  0 siblings, 1 reply; 5+ messages in thread
From: naveen yadav @ 2014-01-21  6:37 UTC (permalink / raw)
  To: Will Deacon; +Cc: Russell King - ARM Linux, Catalin Marinas, linux-kernel

Dear Will,

Thanks for your reply,

We are using Cortex A15.
yes,  this is with ticket lock.

We will check value of arch_spinlock_t and share it. It is bit
difficult to reproduce this scenario.

If you have some idea ,please suggest how to reproduce it.

thanks

On Mon, Jan 20, 2014 at 3:50 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Sat, Jan 18, 2014 at 07:25:51AM +0000, naveen yadav wrote:
>> We are using 3.8.x  kernel on ARM, We are facing soft lockup issue.
>> Following are the logs.
>
> Which CPU/SoC are you using?


>
>> BUG: spinlock lockup suspected on CPU#0, process1/525
>> lock: 0xd8ac9a64, .magic: dead4ead, .owner: <none>/-1, .owner_cpu: -1
>>
>>
>> 1 . Looks like lock is available as owner is -1, why arch_spin_trylock
>> is getting failed ?
>
> Is this with or without the ticket lock patches? Can you inspect the actual
> value of the arch_spinlock_t?

>
>> 2. There is a patch : ARM: spinlock: retry trylock operation if strex
>> fails on free lock
>> http://permalink.gmane.org/gmane.linux.ports.arm.kernel/240913
>> In this patch, A loop has been added around strexeq %2, %0, [%3]".
>> {Comment "retry the trylock operation if the lock appears
>> to be free but the strex reported failure"}
>>
>> but arch_spin_trylock is called by __spin_lock_debug and its already
>> getting called in loops. So what purpose is resolves?
>
> Does this patch help your issue? The purpose of it is to distinguish between
> two types of contention:
>
>   (1) The lock is actually taken
>   (2) The lock is free, but two people are doing a trylock at the same time
>
> In the case of (2), we do actually want to spin again otherwise you could
> potentially end up in a pathological case where the two CPUs repeatedly
> shoot down each other's monitor and forward progress isn't made until the
> sequence is broken by something like an interrupt.
>
>> static void __spin_lock_debug(raw_spinlock_t *lock)
>> {
>>         u64 i;
>>         u64 loops = loops_per_jiffy * HZ;
>>
>>         for (i = 0; i < loops; i++) {
>>                 if (arch_spin_trylock(&lock->raw_lock))
>>                         return;
>>                 __delay(1);
>>         }
>>         /* lockup suspected: */
>>         spin_dump(lock, "lockup suspected");
>> }
>>
>> 3. Is this patch useful to us, How can we reproduce this scenario ?
>> Scenario : Lock is available but arch_spin_trylock  is returning as failure
>
> Potentially. Why can't you simply apply the patch and see if it resolves your
> issue?
>
> Will

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: BUG: spinlock lockup
  2014-01-21  6:37   ` naveen yadav
@ 2014-01-21 10:14     ` Will Deacon
  2014-01-29 10:47       ` naveen yadav
  0 siblings, 1 reply; 5+ messages in thread
From: Will Deacon @ 2014-01-21 10:14 UTC (permalink / raw)
  To: naveen yadav; +Cc: Russell King - ARM Linux, Catalin Marinas, linux-kernel

On Tue, Jan 21, 2014 at 06:37:31AM +0000, naveen yadav wrote:
> Thanks for your reply,
> 
> We are using Cortex A15.
> yes,  this is with ticket lock.
> 
> We will check value of arch_spinlock_t and share it. It is bit
> difficult to reproduce this scenario.
> 
> If you have some idea ,please suggest how to reproduce it.

You could try enabling lockdep and see if it catches anything earlier
on.

Will

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: BUG: spinlock lockup
  2014-01-21 10:14     ` Will Deacon
@ 2014-01-29 10:47       ` naveen yadav
  0 siblings, 0 replies; 5+ messages in thread
From: naveen yadav @ 2014-01-29 10:47 UTC (permalink / raw)
  To: Will Deacon; +Cc: Russell King - ARM Linux, Catalin Marinas, linux-kernel

Dear Will,


Thanks for your input. We debug by adding print as below and found
very big value difference between next and owner(more then 1000). So
it seams memory corruption.



linux/lib/spinlock_debug.c

 		msg, raw_smp_processor_id(),
 		current->comm, task_pid_nr(current));
 	printk(KERN_EMERG " lock: %pS, .magic: %08x, .owner: %s/%d, "
-			".owner_cpu: %d\n",
+		".owner_cpu: %d raw_lock.tickets.next %u raw_lock.tickets.owner %u \n",
 		lock, lock->magic,
 		owner ? owner->comm : "<none>",
 		owner ? task_pid_nr(owner) : -1,
-		lock->owner_cpu);
+		lock->owner_cpu,
+		lock->raw_lock.tickets.next,
+		lock->raw_lock.tickets.owner);
 	dump_stack();
 }


I have one request, is it possible to change like below, if any
corruption, it is easy to debug .
if magic is corrupt, we can find quickly.

typedef struct raw_spinlock {

#ifdef CONFIG_DEBUG_SPINLOCK
        unsigned int magic, owner_cpu;
        void *owner;
#endif

        arch_spinlock_t raw_lock;
#ifdef CONFIG_GENERIC_LOCKBREAK
        unsigned int break_lock;
#endif
#ifdef CONFIG_DEBUG_LOCK_ALLOC
        struct lockdep_map dep_map;
#endif
} raw_spinlock_t;

So if this structure got corrupt,
On Tue, Jan 21, 2014 at 3:44 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Tue, Jan 21, 2014 at 06:37:31AM +0000, naveen yadav wrote:
>> Thanks for your reply,
>>
>> We are using Cortex A15.
>> yes,  this is with ticket lock.
>>
>> We will check value of arch_spinlock_t and share it. It is bit
>> difficult to reproduce this scenario.
>>
>> If you have some idea ,please suggest how to reproduce it.
>
> You could try enabling lockdep and see if it catches anything earlier
> on.
>
> Will

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-01-29 10:47 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-18  7:25 BUG: spinlock lockup naveen yadav
2014-01-20 10:20 ` Will Deacon
2014-01-21  6:37   ` naveen yadav
2014-01-21 10:14     ` Will Deacon
2014-01-29 10:47       ` naveen yadav

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).