All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Help me: deadlock reported in pthread_mutex_lock
       [not found]   ` <CAAh1qt=fZuSGvAQ_75ppYd4XALqjA6obSGkK2gn=xpjyYhSv9A@mail.gmail.com>
@ 2019-03-12 13:28     ` Yimin Deng
  0 siblings, 0 replies; 2+ messages in thread
From: Yimin Deng @ 2019-03-12 13:28 UTC (permalink / raw)
  To: Chinmay V S; +Cc: linux-rt-users

Hi CVS,

No related item found in the latest cpu errata.

It's difficult to reproduce the issue and to debug the glibc which
provided in binary by the third party. According to the coredump, it
should be free after a while. So I do a workaround: modify that
mutex's attribute to be PTHREAD_MUTEX_PI_NORMAL_NP (the mutex is
shared by two cores on SMP) and replace the pthread_mutex_lock with
'do pthread_mutex_timedlock while its return value is ETIMEOUT'. I
hope it will work. But it will take a long time to verify whether it's
OK.

Thanks again for your concern!

B.R.
Yimin

> Hi CVS,
>
> Sincerely appreciate for your effort on this issue!
>
> > > I could not image a scenario that lead to 3 different values on the
> > > same variable mutex->__data.__lock seen in 3 positions.
> > Is there a common pattern to these 3 different values?
> > - Do they look like memory bit-flips (if yes, then misbehaviour /
> > misconfigured HW?)
>
> For example, when ThreadA's tid is 0x1100, then the 'oldval' in the
> stack is 0x11c1 (ThreadB's tid on another cpu). When ThreadA's tid is
> 0x1000, then the 'oldval' is 0x1057. So it seems not bit-flip between
> these 2 different values. Which value do you mean is bit-flipped?
>
> > - Do they always contain the same pattern (if yes, then
> > buffer-overflow? Add "guard-bytes" around the struct and attempt to
> > reproduce the behaviour)
>
> At least on one occurrance, the variable at the address
> (&(mutex->__data.__lock) - 8) (a global variable save the value of
> sysconf(_SC_NPROCESSORS_ONLN)) is correct, i.e. not overwritten.
> (&(mutex->__data.__lock) - 4) is a *fill* (i.e. actually not used) and
> the value in coredump is 0. And according to the coredump, the
> mutex->__data.__kind (its address is behind the
> &(mutex->__data.__lock)) is also correct. I could not add guard-bytes
> in the struct pthread_mutex_t, beause the glibc is included in the
> toolchain provided by third party.
>
>
> > Slightly tangential.
> > Have you checked for any unpatched HW errata in CPU, cache controller,
> > memory controller
> > that can cause writes posted by the CPU to be lost in some rare scenarios?
>
> I'm asking for the latest version of the errata.
>
> I'll update you if there's any progress.
>
> B.R.
> Yimin

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Help me: deadlock reported in pthread_mutex_lock
       [not found] ` <20190219172156.syot6kfy37dzvrui@linutronix.de>
@ 2019-03-12 13:37   ` Yimin Deng
  0 siblings, 0 replies; 2+ messages in thread
From: Yimin Deng @ 2019-03-12 13:37 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

Hi Sebastian,

Thank you for your concern!

We also doubt that there's something wrong on the atomic operation or
on cache line consistency on SMP. But it's really hard to confirm or
to debug it because it's difficult to reproduce it.

B.R.
Yimin

> On 2019-02-14 16:09:48 [+0800], Yimin Deng wrote:
> > I could not image a scenario that lead to 3 different values on the
> > same variable mutex->__data.__lock seen in 3 positions.
> > It's very difficult to reproduce this issue (About 1 ~ several months
> > for 1 reproducing). And we failed to reproduce it using small
> > application.
>
> what are the chances that the atomic operations are not so atomic?
>
> > B.R.
> > Yimin
>
> Sebastian

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2019-03-12 13:38 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAAh1qtmTVSPfzsGsuxbu3qzAQ9U=3wBCkVFh4DDRjeOA_eqpYA@mail.gmail.com>
     [not found] ` <CAK-9PRBa4H=aTRR6h6B9jmmBW8wf3eXJbAOVipR6ehuYMrYxcQ@mail.gmail.com>
     [not found]   ` <CAAh1qt=fZuSGvAQ_75ppYd4XALqjA6obSGkK2gn=xpjyYhSv9A@mail.gmail.com>
2019-03-12 13:28     ` Help me: deadlock reported in pthread_mutex_lock Yimin Deng
     [not found] ` <20190219172156.syot6kfy37dzvrui@linutronix.de>
2019-03-12 13:37   ` Yimin Deng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.