linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Davidlohr Bueso <davidlohr@hp.com>
To: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	torvalds@linux-foundation.org,
	LKML <linux-kernel@vger.kernel.org>,
	paulus@samba.org, tglx@linutronix.de,
	Paul McKenney <paulmck@linux.vnet.ibm.com>,
	linuxppc-dev@lists.ozlabs.org, mingo@kernel.org
Subject: Re: Tasks stuck in futex code (in 3.14-rc6)
Date: Wed, 19 Mar 2014 22:56:59 -0700	[thread overview]
Message-ID: <1395295019.2612.11.camel@buesod1.americas.hpqcorp.net> (raw)
In-Reply-To: <20140320053350.GB30295@linux.vnet.ibm.com>

On Thu, 2014-03-20 at 11:03 +0530, Srikar Dronamraju wrote:
> > > Joy,.. let me look at that with ppc in mind.
> > 
> > OK; so while pretty much all the comments from that patch are utter
> > nonsense (what was I thinking), I cannot actually find a real bug.
> > 
> > But could you try the below which replaces a control dependency with a
> > full barrier. The control flow is plenty convoluted that I think the
> > control barrier isn't actually valid anymore and that might indeed
> > explain the fail.
> > 
> 
> Unfortunately the patch didnt help. Still seeing tasks stuck
> 
> # ps -Ao pid,tt,user,fname,tmout,f,wchan | grep futex
> 14680 pts/0    root     java         - 0 futex_wait_queue_me
> 14797 pts/0    root     java         - 0 futex_wait_queue_me
> # :> /var/log/messages
> # echo t > /proc/sysrq-trigger 
> # grep futex_wait_queue_me /var/log/messages | wc -l 
> 334
> #
> 
> [ 6904.211478] Call Trace:
> [ 6904.211481] [c000000fa1f1b4d0] [0000000000000020] 0x20 (unreliable)
> [ 6904.211486] [c000000fa1f1b6a0] [c000000000015208] .__switch_to+0x1e8/0x330
> [ 6904.211491] [c000000fa1f1b750] [c000000000702f00] .__schedule+0x360/0x8b0
> [ 6904.211495] [c000000fa1f1b9d0] [c000000000147348] .futex_wait_queue_me+0xf8/0x1a0
> [ 6904.211500] [c000000fa1f1ba60] [c0000000001486dc] .futex_wait+0x17c/0x2a0
> [ 6904.211505] [c000000fa1f1bc10] [c00000000014a614] .do_futex+0x254/0xd80
> [ 6904.211510] [c000000fa1f1bd60] [c00000000014b25c] .SyS_futex+0x11c/0x1d0
> [ 6904.238874] [c000000fa1f1be30] [c00000000000a0fc] syscall_exit+0x0/0x7c
> [ 6904.238879] java            S 00003fff825f6044     0 14682  14076 0x00000080
> 
> Is there any other information that I provide that can help?

This problem suggests that we missed a wakeup for a task that was adding
itself to the queue in a wait path. And the only place that can happen
is with the hb spinlock check for any pending waiters. Just in case we
missed some assumption about checking the hash bucket spinlock as a way
of detecting any waiters (powerpc?), could you revert this commit and
try the original atomic operations variant:

https://lkml.org/lkml/2013/12/19/630

  reply	other threads:[~2014-03-20  5:57 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-19 15:26 Tasks stuck in futex code (in 3.14-rc6) Srikar Dronamraju
2014-03-19 15:47 ` Peter Zijlstra
2014-03-19 16:09   ` Srikar Dronamraju
2014-03-19 17:08   ` Peter Zijlstra
2014-03-19 18:06     ` Davidlohr Bueso
2014-03-20  5:33     ` Srikar Dronamraju
2014-03-20  5:56       ` Davidlohr Bueso [this message]
2014-03-20 10:08         ` Srikar Dronamraju
2014-03-20 15:06           ` Davidlohr Bueso
2014-03-20 16:31         ` Davidlohr Bueso
2014-03-20 20:23           ` Benjamin Herrenschmidt
2014-03-20 16:41         ` Linus Torvalds
2014-03-20 17:18           ` Davidlohr Bueso
2014-03-20 17:42             ` Linus Torvalds
2014-03-20 18:03               ` Davidlohr Bueso
2014-03-20 18:16                 ` Linus Torvalds
2014-03-20 18:36             ` Linus Torvalds
2014-03-20 19:08               ` Davidlohr Bueso
2014-03-20 19:25                 ` Linus Torvalds
2014-03-20 20:20                   ` Davidlohr Bueso
2014-03-20 20:36                     ` Linus Torvalds
2014-03-21  4:55                     ` Srikar Dronamraju
2014-03-21  5:24                       ` Linus Torvalds
2014-03-22  2:27                         ` Srikar Dronamraju
2014-03-22  3:36                           ` Davidlohr Bueso
2014-03-20  7:23       ` Peter Zijlstra
2014-03-19 16:04 ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1395295019.2612.11.camel@buesod1.americas.hpqcorp.net \
    --to=davidlohr@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mingo@kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).