* BUG: spinlock lockup @ 2014-01-18 7:25 naveen yadav 2014-01-20 10:20 ` Will Deacon 0 siblings, 1 reply; 5+ messages in thread From: naveen yadav @ 2014-01-18 7:25 UTC (permalink / raw) To: Russell King - ARM Linux, Catalin Marinas, linux-kernel, will.deacon Dear All, We are using 3.8.x kernel on ARM, We are facing soft lockup issue. Following are the logs. BUG: spinlock lockup suspected on CPU#0, process1/525 lock: 0xd8ac9a64, .magic: dead4ead, .owner: <none>/-1, .owner_cpu: -1 1 . Looks like lock is available as owner is -1, why arch_spin_trylock is getting failed ? 2. There is a patch : ARM: spinlock: retry trylock operation if strex fails on free lock http://permalink.gmane.org/gmane.linux.ports.arm.kernel/240913 In this patch, A loop has been added around strexeq %2, %0, [%3]". {Comment "retry the trylock operation if the lock appears to be free but the strex reported failure"} but arch_spin_trylock is called by __spin_lock_debug and its already getting called in loops. So what purpose is resolves? static void __spin_lock_debug(raw_spinlock_t *lock) { u64 i; u64 loops = loops_per_jiffy * HZ; for (i = 0; i < loops; i++) { if (arch_spin_trylock(&lock->raw_lock)) return; __delay(1); } /* lockup suspected: */ spin_dump(lock, "lockup suspected"); } 3. Is this patch useful to us, How can we reproduce this scenario ? Scenario : Lock is available but arch_spin_trylock is returning as failure Thanks ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: BUG: spinlock lockup 2014-01-18 7:25 BUG: spinlock lockup naveen yadav @ 2014-01-20 10:20 ` Will Deacon 2014-01-21 6:37 ` naveen yadav 0 siblings, 1 reply; 5+ messages in thread From: Will Deacon @ 2014-01-20 10:20 UTC (permalink / raw) To: naveen yadav; +Cc: Russell King - ARM Linux, Catalin Marinas, linux-kernel On Sat, Jan 18, 2014 at 07:25:51AM +0000, naveen yadav wrote: > We are using 3.8.x kernel on ARM, We are facing soft lockup issue. > Following are the logs. Which CPU/SoC are you using? > BUG: spinlock lockup suspected on CPU#0, process1/525 > lock: 0xd8ac9a64, .magic: dead4ead, .owner: <none>/-1, .owner_cpu: -1 > > > 1 . Looks like lock is available as owner is -1, why arch_spin_trylock > is getting failed ? Is this with or without the ticket lock patches? Can you inspect the actual value of the arch_spinlock_t? > 2. There is a patch : ARM: spinlock: retry trylock operation if strex > fails on free lock > http://permalink.gmane.org/gmane.linux.ports.arm.kernel/240913 > In this patch, A loop has been added around strexeq %2, %0, [%3]". > {Comment "retry the trylock operation if the lock appears > to be free but the strex reported failure"} > > but arch_spin_trylock is called by __spin_lock_debug and its already > getting called in loops. So what purpose is resolves? Does this patch help your issue? The purpose of it is to distinguish between two types of contention: (1) The lock is actually taken (2) The lock is free, but two people are doing a trylock at the same time In the case of (2), we do actually want to spin again otherwise you could potentially end up in a pathological case where the two CPUs repeatedly shoot down each other's monitor and forward progress isn't made until the sequence is broken by something like an interrupt. > static void __spin_lock_debug(raw_spinlock_t *lock) > { > u64 i; > u64 loops = loops_per_jiffy * HZ; > > for (i = 0; i < loops; i++) { > if (arch_spin_trylock(&lock->raw_lock)) > return; > __delay(1); > } > /* lockup suspected: */ > spin_dump(lock, "lockup suspected"); > } > > 3. Is this patch useful to us, How can we reproduce this scenario ? > Scenario : Lock is available but arch_spin_trylock is returning as failure Potentially. Why can't you simply apply the patch and see if it resolves your issue? Will ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: BUG: spinlock lockup 2014-01-20 10:20 ` Will Deacon @ 2014-01-21 6:37 ` naveen yadav 2014-01-21 10:14 ` Will Deacon 0 siblings, 1 reply; 5+ messages in thread From: naveen yadav @ 2014-01-21 6:37 UTC (permalink / raw) To: Will Deacon; +Cc: Russell King - ARM Linux, Catalin Marinas, linux-kernel Dear Will, Thanks for your reply, We are using Cortex A15. yes, this is with ticket lock. We will check value of arch_spinlock_t and share it. It is bit difficult to reproduce this scenario. If you have some idea ,please suggest how to reproduce it. thanks On Mon, Jan 20, 2014 at 3:50 PM, Will Deacon <will.deacon@arm.com> wrote: > On Sat, Jan 18, 2014 at 07:25:51AM +0000, naveen yadav wrote: >> We are using 3.8.x kernel on ARM, We are facing soft lockup issue. >> Following are the logs. > > Which CPU/SoC are you using? > >> BUG: spinlock lockup suspected on CPU#0, process1/525 >> lock: 0xd8ac9a64, .magic: dead4ead, .owner: <none>/-1, .owner_cpu: -1 >> >> >> 1 . Looks like lock is available as owner is -1, why arch_spin_trylock >> is getting failed ? > > Is this with or without the ticket lock patches? Can you inspect the actual > value of the arch_spinlock_t? > >> 2. There is a patch : ARM: spinlock: retry trylock operation if strex >> fails on free lock >> http://permalink.gmane.org/gmane.linux.ports.arm.kernel/240913 >> In this patch, A loop has been added around strexeq %2, %0, [%3]". >> {Comment "retry the trylock operation if the lock appears >> to be free but the strex reported failure"} >> >> but arch_spin_trylock is called by __spin_lock_debug and its already >> getting called in loops. So what purpose is resolves? > > Does this patch help your issue? The purpose of it is to distinguish between > two types of contention: > > (1) The lock is actually taken > (2) The lock is free, but two people are doing a trylock at the same time > > In the case of (2), we do actually want to spin again otherwise you could > potentially end up in a pathological case where the two CPUs repeatedly > shoot down each other's monitor and forward progress isn't made until the > sequence is broken by something like an interrupt. > >> static void __spin_lock_debug(raw_spinlock_t *lock) >> { >> u64 i; >> u64 loops = loops_per_jiffy * HZ; >> >> for (i = 0; i < loops; i++) { >> if (arch_spin_trylock(&lock->raw_lock)) >> return; >> __delay(1); >> } >> /* lockup suspected: */ >> spin_dump(lock, "lockup suspected"); >> } >> >> 3. Is this patch useful to us, How can we reproduce this scenario ? >> Scenario : Lock is available but arch_spin_trylock is returning as failure > > Potentially. Why can't you simply apply the patch and see if it resolves your > issue? > > Will ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: BUG: spinlock lockup 2014-01-21 6:37 ` naveen yadav @ 2014-01-21 10:14 ` Will Deacon 2014-01-29 10:47 ` naveen yadav 0 siblings, 1 reply; 5+ messages in thread From: Will Deacon @ 2014-01-21 10:14 UTC (permalink / raw) To: naveen yadav; +Cc: Russell King - ARM Linux, Catalin Marinas, linux-kernel On Tue, Jan 21, 2014 at 06:37:31AM +0000, naveen yadav wrote: > Thanks for your reply, > > We are using Cortex A15. > yes, this is with ticket lock. > > We will check value of arch_spinlock_t and share it. It is bit > difficult to reproduce this scenario. > > If you have some idea ,please suggest how to reproduce it. You could try enabling lockdep and see if it catches anything earlier on. Will ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: BUG: spinlock lockup 2014-01-21 10:14 ` Will Deacon @ 2014-01-29 10:47 ` naveen yadav 0 siblings, 0 replies; 5+ messages in thread From: naveen yadav @ 2014-01-29 10:47 UTC (permalink / raw) To: Will Deacon; +Cc: Russell King - ARM Linux, Catalin Marinas, linux-kernel Dear Will, Thanks for your input. We debug by adding print as below and found very big value difference between next and owner(more then 1000). So it seams memory corruption. linux/lib/spinlock_debug.c msg, raw_smp_processor_id(), current->comm, task_pid_nr(current)); printk(KERN_EMERG " lock: %pS, .magic: %08x, .owner: %s/%d, " - ".owner_cpu: %d\n", + ".owner_cpu: %d raw_lock.tickets.next %u raw_lock.tickets.owner %u \n", lock, lock->magic, owner ? owner->comm : "<none>", owner ? task_pid_nr(owner) : -1, - lock->owner_cpu); + lock->owner_cpu, + lock->raw_lock.tickets.next, + lock->raw_lock.tickets.owner); dump_stack(); } I have one request, is it possible to change like below, if any corruption, it is easy to debug . if magic is corrupt, we can find quickly. typedef struct raw_spinlock { #ifdef CONFIG_DEBUG_SPINLOCK unsigned int magic, owner_cpu; void *owner; #endif arch_spinlock_t raw_lock; #ifdef CONFIG_GENERIC_LOCKBREAK unsigned int break_lock; #endif #ifdef CONFIG_DEBUG_LOCK_ALLOC struct lockdep_map dep_map; #endif } raw_spinlock_t; So if this structure got corrupt, On Tue, Jan 21, 2014 at 3:44 PM, Will Deacon <will.deacon@arm.com> wrote: > On Tue, Jan 21, 2014 at 06:37:31AM +0000, naveen yadav wrote: >> Thanks for your reply, >> >> We are using Cortex A15. >> yes, this is with ticket lock. >> >> We will check value of arch_spinlock_t and share it. It is bit >> difficult to reproduce this scenario. >> >> If you have some idea ,please suggest how to reproduce it. > > You could try enabling lockdep and see if it catches anything earlier > on. > > Will ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-01-29 10:47 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-01-18 7:25 BUG: spinlock lockup naveen yadav 2014-01-20 10:20 ` Will Deacon 2014-01-21 6:37 ` naveen yadav 2014-01-21 10:14 ` Will Deacon 2014-01-29 10:47 ` naveen yadav
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).