All of lore.kernel.org
 help / color / mirror / Atom feed
From: Austin Schuh <austin@peloton-tech.com>
To: John Blackwood <john.blackwood@ccur.com>
Cc: Richard Weinberger <richard.weinberger@gmail.com>,
	linux-kernel@vger.kernel.org, xfs <xfs@oss.sgi.com>,
	linux-rt-users@vger.kernel.org
Subject: Re: Filesystem lockup with CONFIG_PREEMPT_RT
Date: Wed, 21 May 2014 14:59:30 -0700	[thread overview]
Message-ID: <CANGgnMYCFVQ7aR2Qjo+c5B=Jyi3QfyhAHd6y3imSN_7a3Y4Ekg@mail.gmail.com> (raw)
In-Reply-To: <537CFECF.9070701@ccur.com>

On Wed, May 21, 2014 at 12:30 PM, John Blackwood
<john.blackwood@ccur.com> wrote:
>> Date: Wed, 21 May 2014 03:33:49 -0400
>> From: Richard Weinberger <richard.weinberger@gmail.com>
>> To: Austin Schuh <austin@peloton-tech.com>
>> CC: LKML <linux-kernel@vger.kernel.org>, xfs <xfs@oss.sgi.com>, rt-users
>>       <linux-rt-users@vger.kernel.org>
>> Subject: Re: Filesystem lockup with CONFIG_PREEMPT_RT
>
>>
>> CC'ing RT folks
>>
>> On Wed, May 21, 2014 at 8:23 AM, Austin Schuh <austin@peloton-tech.com>
>> wrote:
>> > > On Tue, May 13, 2014 at 7:29 PM, Austin Schuh
>> > > <austin@peloton-tech.com> wrote:
>> >> >> Hi,
>> >> >>
>> >> >> I am observing a filesystem lockup with XFS on a CONFIG_PREEMPT_RT
>> >> >> patched kernel.  I have currently only triggered it using dpkg.
>> >> >> Dave
>> >> >> Chinner on the XFS mailing list suggested that it was a rt-kernel
>> >> >> workqueue issue as opposed to a XFS problem after looking at the
>> >> >> kernel messages.
>> >> >>
>> >> >> The only modification to the kernel besides the RT patch is that I
>> >> >> have applied tglx's "genirq: Sanitize spurious interrupt detection
>> >> >> of
>> >> >> threaded irqs" patch.
>> > >
>> > > I upgraded to 3.14.3-rt4, and the problem still persists.
>> > >
>> > > I turned on event tracing and tracked it down further.  I'm able to
>> > > lock it up by scping a new kernel debian package to /tmp/ on the
>> > > machine.  scp is locking the inode, and then scheduling
>> > > xfs_bmapi_allocate_worker in the work queue.  The work then never gets
>> > > run.  The kworkers then lock up waiting for the inode lock.
>> > >
>> > > Here are the relevant events from the trace.  ffff8803e9f10288
>> > > (blk_delay_work) gets run later on in the trace, but ffff8803b4c158d0
>> > > (xfs_bmapi_allocate_worker) never does.  The kernel then warns about
>> > > blocked tasks 120 seconds later.
>
> Austin and Richard,
>
> I'm not 100% sure that the patch below will fix your problem, but we
> saw something that sounds pretty familiar to your issue involving the
> nvidia driver and the preempt-rt patch.  The nvidia driver uses the
> completion support to create their own driver's notion of an internally
> used semaphore.
>
> Some tasks were failing to ever wakeup from wait_for_completion() calls
> due to a race in the underlying do_wait_for_common() routine.

Hi John,

Thanks for the suggestion and patch.  The issue is that the work never
gets run, not that the work finishes but the waiter never gets woken.
I applied it anyways to see if it helps, but I still get the lockup.

Thanks,
    Austin

WARNING: multiple messages have this Message-ID (diff)
From: Austin Schuh <austin@peloton-tech.com>
To: John Blackwood <john.blackwood@ccur.com>
Cc: Richard Weinberger <richard.weinberger@gmail.com>,
	linux-rt-users@vger.kernel.org, linux-kernel@vger.kernel.org,
	xfs <xfs@oss.sgi.com>
Subject: Re: Filesystem lockup with CONFIG_PREEMPT_RT
Date: Wed, 21 May 2014 14:59:30 -0700	[thread overview]
Message-ID: <CANGgnMYCFVQ7aR2Qjo+c5B=Jyi3QfyhAHd6y3imSN_7a3Y4Ekg@mail.gmail.com> (raw)
In-Reply-To: <537CFECF.9070701@ccur.com>

On Wed, May 21, 2014 at 12:30 PM, John Blackwood
<john.blackwood@ccur.com> wrote:
>> Date: Wed, 21 May 2014 03:33:49 -0400
>> From: Richard Weinberger <richard.weinberger@gmail.com>
>> To: Austin Schuh <austin@peloton-tech.com>
>> CC: LKML <linux-kernel@vger.kernel.org>, xfs <xfs@oss.sgi.com>, rt-users
>>       <linux-rt-users@vger.kernel.org>
>> Subject: Re: Filesystem lockup with CONFIG_PREEMPT_RT
>
>>
>> CC'ing RT folks
>>
>> On Wed, May 21, 2014 at 8:23 AM, Austin Schuh <austin@peloton-tech.com>
>> wrote:
>> > > On Tue, May 13, 2014 at 7:29 PM, Austin Schuh
>> > > <austin@peloton-tech.com> wrote:
>> >> >> Hi,
>> >> >>
>> >> >> I am observing a filesystem lockup with XFS on a CONFIG_PREEMPT_RT
>> >> >> patched kernel.  I have currently only triggered it using dpkg.
>> >> >> Dave
>> >> >> Chinner on the XFS mailing list suggested that it was a rt-kernel
>> >> >> workqueue issue as opposed to a XFS problem after looking at the
>> >> >> kernel messages.
>> >> >>
>> >> >> The only modification to the kernel besides the RT patch is that I
>> >> >> have applied tglx's "genirq: Sanitize spurious interrupt detection
>> >> >> of
>> >> >> threaded irqs" patch.
>> > >
>> > > I upgraded to 3.14.3-rt4, and the problem still persists.
>> > >
>> > > I turned on event tracing and tracked it down further.  I'm able to
>> > > lock it up by scping a new kernel debian package to /tmp/ on the
>> > > machine.  scp is locking the inode, and then scheduling
>> > > xfs_bmapi_allocate_worker in the work queue.  The work then never gets
>> > > run.  The kworkers then lock up waiting for the inode lock.
>> > >
>> > > Here are the relevant events from the trace.  ffff8803e9f10288
>> > > (blk_delay_work) gets run later on in the trace, but ffff8803b4c158d0
>> > > (xfs_bmapi_allocate_worker) never does.  The kernel then warns about
>> > > blocked tasks 120 seconds later.
>
> Austin and Richard,
>
> I'm not 100% sure that the patch below will fix your problem, but we
> saw something that sounds pretty familiar to your issue involving the
> nvidia driver and the preempt-rt patch.  The nvidia driver uses the
> completion support to create their own driver's notion of an internally
> used semaphore.
>
> Some tasks were failing to ever wakeup from wait_for_completion() calls
> due to a race in the underlying do_wait_for_common() routine.

Hi John,

Thanks for the suggestion and patch.  The issue is that the work never
gets run, not that the work finishes but the waiter never gets woken.
I applied it anyways to see if it helps, but I still get the lockup.

Thanks,
    Austin

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2014-05-21 21:59 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-21 19:30 Filesystem lockup with CONFIG_PREEMPT_RT John Blackwood
2014-05-21 19:30 ` John Blackwood
2014-05-21 21:59 ` Austin Schuh [this message]
2014-05-21 21:59   ` Austin Schuh
2014-07-05 20:36 ` Thomas Gleixner
2014-07-05 20:36   ` Thomas Gleixner
  -- strict thread matches above, loose matches on Subject: below --
2014-07-07  8:48 Jan de Kruyf
2014-07-07 13:00 ` Thomas Gleixner
2014-07-07 16:23 ` Austin Schuh
2014-07-08  8:03   ` Jan de Kruyf
2014-07-08 16:09     ` Austin Schuh
2014-07-05 19:30 Jan de Kruyf
2014-05-14  2:29 Austin Schuh
2014-05-14  2:29 ` Austin Schuh
2014-05-21  6:23 ` Austin Schuh
2014-05-21  6:23   ` Austin Schuh
2014-05-21  7:33   ` Richard Weinberger
2014-05-21  7:33     ` Richard Weinberger
2014-06-26 19:50     ` Austin Schuh
2014-06-26 22:35       ` Thomas Gleixner
2014-06-27  0:07         ` Austin Schuh
2014-06-27  3:22           ` Mike Galbraith
2014-06-27 12:57           ` Mike Galbraith
2014-06-27 14:01             ` Steven Rostedt
2014-06-27 17:34               ` Mike Galbraith
2014-06-27 17:54                 ` Steven Rostedt
2014-06-27 18:07                   ` Mike Galbraith
2014-06-27 18:19                     ` Steven Rostedt
2014-06-27 19:11                       ` Mike Galbraith
2014-06-28  1:18                       ` Austin Schuh
2014-06-28  3:32                         ` Mike Galbraith
2014-06-28  6:20                           ` Austin Schuh
2014-06-28  7:11                             ` Mike Galbraith
2014-06-27 14:24           ` Thomas Gleixner
2014-06-28  4:51             ` Mike Galbraith
2014-07-01  0:12             ` Austin Schuh
2014-07-01  0:53               ` Austin Schuh
2014-07-05 20:26                 ` Thomas Gleixner
2014-07-06  4:55                   ` Austin Schuh
2014-07-01  3:01             ` Austin Schuh
2014-07-01 19:32               ` Austin Schuh
2014-07-03 23:08                 ` Austin Schuh
2014-07-04  4:42                   ` Mike Galbraith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANGgnMYCFVQ7aR2Qjo+c5B=Jyi3QfyhAHd6y3imSN_7a3Y4Ekg@mail.gmail.com' \
    --to=austin@peloton-tech.com \
    --cc=john.blackwood@ccur.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=richard.weinberger@gmail.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.